Amazon Web Hosting Problem Brings Down Scores of Web Sites

There are several news reports today like these at the BBC, MSNBC and ZDNet about problems plaguing Amazon's Elastic Compute Cloud (EC2) web hosting services that have brought down or have caused problems at scores of web sites, including  Foursquare, Quora, Reddit and Hootsuite, among others.

Amazon said at its web services service health dashboard that the problem began around 0141 PDT at its EC2 site in Northern Virginia when "latency and error rates with EBS [Elastic Block Store] volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region" occurred.

The problem then apparently cascaded into Amazon's relational database service in Northern Virginia as well as with Amazon's Web Service (AWS) Elastic Beanstalk, which is a way for developers to quickly deploy and manage applications in the AWS cloud.

At 0854 PDT it said:

"We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them."

There are no more recent messages about when service will be completely restored, although some of the affected web sites are saying that things are getting back to normal. A recent Bloomberg News report at 1244 EST, however, says that "scant headway" is being made in fixing the problems.

Look for this event to cause more questions to be raised about the reliability of cloud computing.

Related Stories

Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Contributor
Willie D. Jones
 

Newsletter Sign Up

Sign up for the ComputerWise newsletter and get biweekly news and analysis on software, systems, and IT delivered directly to your inbox.

Advertisement
Advertisement