A chronological record of significant internet infrastructure failures and service disruptions in 2024. Each incident includes root cause analysis, duration, and scope of impact.
ChatGPT and the OpenAI API experienced a multi-hour outage, causing 503 and 429 errors across all tiers. The outage was particularly disruptive given the high number of production applications reliant on the OpenAI API.
DURATION
~3 hours
AFFECTED
All ChatGPT users and OpenAI API customers globally
ROOT CAUSE
Infrastructure overload following a surge in demand; OpenAI published a brief incident update
Discord users across North America and Europe reported inability to send messages, join voice channels, or load servers. The outage coincided with high-traffic periods and was resolved within a few hours.
DURATION
~2.5 hours
AFFECTED
Discord users in North America and Europe
ROOT CAUSE
Network infrastructure degradation under load
CRITICAL
CrowdStrike / Windows
CrowdStrike Falcon sensor update triggers global Windows BSOD
A faulty content configuration update to the CrowdStrike Falcon sensor caused an estimated 8.5 million Windows machines worldwide to crash with a Blue Screen of Death. Airlines, hospitals, banks, broadcasters, and emergency services were among the most severely affected. The incident was widely described as the largest IT outage in history.
DURATION
Up to 10+ hours for most affected systems; some took days to recover
AFFECTED
~8.5 million Windows systems globally across 100+ countries
ROOT CAUSE
Defective channel file 291 pushed via Falcon sensor auto-update; the file caused an out-of-bounds memory read in Windows kernel space
A disruption to Cloudflare's R2 object storage service caused widespread issues for websites and applications that depend on it for assets and media. Cloudflare published a detailed post-mortem attributing the issue to a misconfigured network device.
DURATION
~1 hour
AFFECTED
Sites and apps dependent on Cloudflare R2 globally
ROOT CAUSE
Misconfigured network device in Cloudflare's core infrastructure affecting R2 data plane
MAJOR
AWS us-east-1
AWS us-east-1 power event affects multiple services
An electrical event at an AWS data center in Northern Virginia caused disruptions to EC2 instances, RDS databases, and several managed services. Customers running workloads exclusively in us-east-1 without cross-region failover were most affected.
DURATION
~4 hours
AFFECTED
AWS customers in us-east-1 running EC2, RDS, and dependent managed services
ROOT CAUSE
Unplanned power event at a Northern Virginia data center facility
Facebook, Instagram, WhatsApp, Messenger, and Threads all went down simultaneously for around two hours. Hundreds of millions of users were unable to log in or use the services. Meta attributed the outage to a technical issue during a configuration change.
DURATION
~2 hours
AFFECTED
Hundreds of millions of users worldwide
ROOT CAUSE
Configuration change that caused a cascading failure across Meta's backbone infrastructure
GitHub experienced significant degradation affecting Actions, repository clones, webhooks, and API availability. The incident impacted CI/CD pipelines worldwide, blocking deployments for developers and engineering teams.
DURATION
~3 hours
AFFECTED
Developers and teams globally relying on GitHub Actions and Git operations
Data sourced from public post-mortems, official status pages (statuspage.io, instatus.com), and credible tech reporting. Duration estimates are approximate based on incident start/resolution timestamps. If you notice an inaccuracy or want to suggest an addition, contact us.