Breaking Down the October 2025 AWS Outage in Plain English

If you woke up on October 20, 2025, and your app hosted in us-east-1 started acting like it forgot its own name, you weren’t alone. Half the internet basically sneezed in sync. Here’s what really went down without the 10,000-word AWS jargon-fest.

Source: wired.com

The Short Version

A small DNS bug in DynamoDB, one of AWS’s core databases accidentally deleted its own “where to find me” address.
When DynamoDB vanished from the internet’s phonebook, dozens of other AWS services that rely on it (like EC2, Lambda, and Redshift) suddenly couldn’t function.

One small DNS record went missing. The ripple effect?
Thousands of systems went, “Uh… where’s DynamoDB?”


The Chain Reaction Explained (Like We’re at a Coffee Shop)

The Root of Chaos: A DNS Glitch

Every AWS service has a DNS record like a street address for your app.
DynamoDB’s automation system had a race condition, meaning two parts of its system tried to update DNS at the same time.
Result: one overwrote the other and wiped the main DNS entry clean. Boom. DynamoDB disappeared from the map.

So, imagine you’re Google Maps, but you just forgot where New York City is. That’s what happened here.


EC2 Panics: “Where’s My Database?”

EC2 (the virtual machines of AWS) depends on DynamoDB to keep track of which physical servers are doing what.
When DynamoDB went dark, EC2’s backend started losing track of which machines it “leased.”
That’s like forgetting who’s renting which apartment suddenly, you can’t give out new keys (a.k.a., launch new EC2 instances).

Existing servers were fine, but new ones? Not so much. Launch requests failed, error messages flew, and the AWS dashboard started to look like a Christmas tree of red alerts.


Network Traffic Jam

Once DynamoDB was fixed, EC2 rushed to catch up but so many servers were checking in at once that the management system got overloaded and crashed again.
Meanwhile, the Network Manager (which sets up VPC connections) was stuck in a backlog, meaning some new servers were technically “alive” but couldn’t connect to anything.
So yeah, servers were being born into the void.


Load Balancers Lose Their Minds

Then Network Load Balancers (NLB) joined the party.
Because new instances weren’t fully online yet, the NLB’s health checks thought servers were dying.
So it started removing and re-adding healthy servers in a loop, basically load balancer whack-a-mole.

AWS engineers eventually turned off automatic failover, manually stabilized things, and slowly brought services back online.


Everyone Else Feels It

Other services like Lambda, ECS, EKS, Redshift, and Connect all rely on DynamoDB or EC2 in some way.
When those core systems went down, they went down too.
It was a chain reaction that rippled through AWS like a power outage in a skyscraper, one tripped breaker took out the whole floor.


The Recovery Timeline

  • 11:48 PM (Oct 19): DynamoDB DNS fails → everything starts breaking.
  • 2:25 AM: DNS fixed.
  • 10:30 AM: EC2 networking stabilizes.
  • 2 PM: NLB and other services fully recovered.
    Basically, AWS had a really bad night and a rough morning.

What AWS Is Fixing

To prevent this from happening again, AWS said they’re:

  • Fixing the DNS automation race condition that caused this mess.
  • Adding safety checks so DNS records can’t vanish entirely.
  • Improving EC2’s recovery processes and throttling logic to handle massive backlogs better.
  • Adding “velocity controls” to NLB so it doesn’t overreact and drop too much capacity during health check chaos.

In short: more guardrails, fewer ways to delete your own DNS entry by accident.


The Takeaway

Even the most reliable cloud can have bad days. This one was a masterclass in how a tiny automation bug in one core system can ripple across dozens of dependent services.

If your AWS architecture lives in us-east-1, this was a reminder to:

  • Use multi-region redundancy.
  • Don’t let a single service dependency take down your whole app.
  • And maybe, just maybe, give your ops team an extra coffee after nights like that.

TL;DR

A small DNS race condition in DynamoDB nuked its endpoint → EC2 lost track of its servers → networks backed up → load balancers freaked out → half of AWS coughed.
AWS fixed it, learned from it, and probably had a lot of postmortem meetings.

Thank you for stopping by. ✌️

Source: Summary of the Amazon DynamoDB Service Disruption in the Northern Virginia (US-EAST-1) Region

Leave a Comment