The December 2022 issue of IEEE Spectrum is here!

Close bar

Amazon Web Hosting Problem Brings Down Scores of Web Sites

Problem began at its Northern Virginia EC2 cluster early this morning

2 min read
Amazon Web Hosting Problem Brings Down Scores of Web Sites

There are several news reports today like these at the BBC, MSNBC and ZDNet about problems plaguing Amazon's Elastic Compute Cloud (EC2)web hosting services that have brought down or have caused problems at scores of web sites, including  Foursquare, Quora, Reddit and Hootsuite, among others.

Amazon said at its web services service health dashboard that the problem began around 0141 PDT at its EC2 site in Northern Virginia when "latency and error rates with EBS [Elastic Block Store] volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region" occurred.

The problem then apparently cascaded into Amazon's relational database service in Northern Virginia as well as with Amazon's Web Service (AWS) Elastic Beanstalk, which is a way for developers to quickly deploy and manage applications in the AWS cloud.

At 0854 PDT it said:

"We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them."

There are no more recent messages about when service will be completely restored, although some of the affected web sites are saying that things are getting back to normal. A recent Bloomberg News report at 1244 EST, however, says that "scant headway" is being made in fixing the problems.

Look for this event to cause more questions to be raised about the reliability of cloud computing.

The Conversation (0)

Why the Internet Needs the InterPlanetary File System

Peer-to-peer file sharing would make the Internet far more efficient

12 min read
Horizontal
An illustration of a series
Carl De Torres
LightBlue

When the COVID-19 pandemic erupted in early 2020, the world made an unprecedented shift to remote work. As a precaution, some Internet providers scaled back service levels temporarily, although that probably wasn’t necessary for countries in Asia, Europe, and North America, which were generally able to cope with the surge in demand caused by people teleworking (and binge-watching Netflix). That’s because most of their networks were overprovisioned, with more capacity than they usually need. But in countries without the same level of investment in network infrastructure, the picture was less rosy: Internet service providers (ISPs) in South Africa and Venezuela, for instance, reported significant strain.

But is overprovisioning the only way to ensure resilience? We don’t think so. To understand the alternative approach we’re championing, though, you first need to recall how the Internet works.

Keep Reading ↓Show less