Amazon Says Overwhelmed Network Devices Triggered Outage
In a post on the AWS website, the company explains that an automated process caused the outage, which began around 10:30 AM ET in the Northern Virginia (US-EAST-1) region.
“An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network,” Amazon’s report says. “This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks.”
According to the report, this issue even impacted Amazon’s ability to see what exactly was going wrong with the system. It prevented the company’s operations team from using the real-time monitoring system and internal controls that they typically rely on, explaining why the outage took so long to fix. Amazon notes that service started didn’t start improving until 4:34 PM ET, and the issue was fully resolved at 5:22 PM ET.