The 10 Biggest Cloud Outages Of 2024

AT&T, Verizon, CrowdStrike and Microsoft are among the companies that experienced major cloud service outages during the last year.

A February AT&T outage that received attention by federal regulators. Issues in September for Verizon customers. And a certain cybersecurity vendor’s update that crashed Windows machines worldwide.

These are among the biggest cloud service outages the world faced in 2024 (as of Dec. 4).

For the list CRN focused on cloud issues of particular importance to solution providers, skipping outages for consumer products including Meta’s Facebook and Instagram, whose March 5 outage ranked as the largest of the year by Downdetector parent Ookla with more than 11.1 million people reporting issues.

[RELATED: 2024 Year In Review]

2024 Cloud Outages

An October report from observability tech provider New Relic, based on a survey of 1,700 technology professionals worldwide, showed that the median annual downtime from high-impact outages was 77 hours, with an hourly cost of up to $1.9 million in lost revenue and productivity and other expenses.

Engineering teams said they devote 30 percent of their time, 12 hours of every 40-hour work week, addressing service interruptions, the report found. Network failure, third-party or cloud provider service failure, and human error were the leading causes of unplanned outages.

“The State of Resilience 2025,” an October report from database vendor Cockroach Labs based on engagement with 1,000 senior technology executives worldwide, found that 84 percent of responders said they lost at least $10,000 in revenue due to an outage in the last 12 months. One third said they lost between $100,000 and more than $1 million.

Other CRN Year in Review articles so far include The 10 Hottest Cybersecurity Startups Of 2024, The 10 Hottest Semiconductor Startups of 2024, and The 10 Coolest Open-Source Software Tools Of 2024.

Read on to learn more about the 10 biggest cloud outages of 2024.

Database Upgrade Sinks Jira In January

Atlassian’s start to the year was less than smooth with its Jira project management tool giving users 503 service unavailable messages and other error warnings for about four hours starting at 6:52 UTC (Coordinated Universal Time) on Jan. 18.

ThousandEyes said that Jira services were back to normal operations by 10:30 UTC. The issues hit Jira Work Management, Jira Software, Jira Product Discovery and other services offered by Australia-based Atlassian, according to a ThousandEyes report issued on Feb. 2.

Atlassian attributed the degraded performance for the Jira products family to “a scheduled database upgrade on an internal Atlassian Marketplace service.”

“This degraded performance manifested in increasing response times and eventually time outs,” according to the vendor. “This service degradation then cascaded upstream and resulted in requests timing out across the Jira family of products, impacting product experiences.”

February AT&T Outage Catches FCC Attention

On Feb. 22, AT&T users reported outages for the telecommunications giant’s services, including internet access. Downdetector logged more than 3.4 million user reports from the issue, which lasted more than 12 hours.

On Feb. 25, AT&T CEO John Stankey said in a statement that the outage appeared “due to the application and execution of an incorrect process used while working to expand our network.” The vendor also offered $5 credits to customers affected by the outage.

In July, the Federal Communications Commission issued a report on the incident, attributing the cause to a lack of peer review, failure to adequately test post installation, insufficient safeguards and controls to get approval of changes that affect the network, and other factors.

The report noted that AT&T has made changes to prevent the issue from happening again, including “scanning the network for any network elements lacking the controls that would have prevented the outage, and promptly putting those controls in place.” The report said the incident was referred to the Enforcement Bureau “for potential violations of parts 4 and 9 of the Commission’s rules.”

Downdetector parent Ookla called this the third largest outage in the world in 2024 and the largest operator outage in the world since 2020.

Metadata Store Troubles Google Cloud In February

On Feb. 14, a regional metadata store issue resulted in disruption for Google Cloud us-west1 users, ThousandEyes said in a March 1 posting.

The incident lasted about two hours and 40 minutes, according to Google. “Our engineering team mitigated the issue by isolating the problematic traffic and have implemented measures to prevent a recurrence,” Google said, attributing issues to its regional metadata store.

The outage hit a variety of Google Cloud products, Vertex AI products, and Identity and Access Management (IAM).

Faulty CrowdStrike Update

Arguably the most consequential outage of the year was the faulty CrowdStrike update that crashed millions of Microsoft Windows machines worldwide. The incident continues to play out with Delta and CrowdStrike suing each other over who's to blame for the airline’s 7,000 canceled flights over five days.

In the aftermath of the outage, Microsoft has revisited how security tech vendors develop products for Windows. In November, the tech giant said it is working on a way to allow security products to avoid directly accessing the Windows kernel and run in user mode as do applications.

CrowdStrike’s access to the kernel, the core control center of Windows, has been pinpointed as a key factor that enabled the defective July 19 CrowdStrike Falcon update to send 8.5 million Windows devices into a “blue screen of death” state, leading to widespread business and even societal disruptions.

July Microsoft Outages

Beyond the faulty CrowdStrike update debacle, Microsoft experienced service disruption headaches in July.

On July 30, Azure Front Door (AFD), Azure Content Delivery Network (CDN) and downstream services that rely on them suffered an outage, with parts of the Microsoft network degrading around 10:30 UTC, according to ThousandEyes.

Microsoft blamed the problem issue on default traffic routing not resuming as expected following automatic mitigation from an attempted distributed denial-of-service (DDoS) attack and a power outage at a site in Europe.

Microsoft said in a post-incident report that availability returned to pre-incident levels by 19:43 UTC. The vendor said it would make the incident less likely to occur again and less impactful by making sure DDoS mitigation issues in one geography don’t spread to others and by improving monitoring and invalid configuration detection.

September Issues For AT&T, Microsoft

On Sept. 12, AT&T users were unable to access Microsoft 365 and Azure services due to “a third-party Internet Service Provider incident that impacted a subset of their customers' ability,” according to Microsoft.

In a post on X, Microsoft indicated that the outage was caused by an unspecified “change” within the managed environment of a third-party internet service provider, elsewhere identified as AT&T.

ThousandEyes described the issue as “limited to a subset of users connecting to Microsoft’s network directly from or through the AT&T peering point.”

For about 90 minutes, “customers using AT&T to connect to Microsoft services experienced issues accessing our services,” Microsoft said on its Azure status page.

Microsoft’s post-incident report said the issue lasted from 11:46 UTC to 13:14 UTC.

ServiceNow Outage

The unsuccessful update of an expired root certificate was blamed for a ServiceNow outage experienced by about 600 customers on Sept. 23 starting around 2:00 UTC.

A management, instrumentation and discovery (MID) server was hit by the outage and some customers saw connectivity problems between cloud instances and the servers, according to a ThousandEyes report on the issue.

“The outage serves as a reminder of the critical role each function in a digital ecosystem or end-to-end delivery chain plays in maintaining seamless operations,” according to the report. “An application or service is only as strong as its weakest link.”

Verizon Issues In September

Verizon services in the U.S. experienced outages on Sept. 30, with Downdetector logging more than 1.7 million reports.

Users from New York to Los Angeles reported no service or limited service, with the exception of “SOS mode,” which allows users to place emergency calls by falling back on other carrier networks within range.

Downdetector put the start of the issue at around 9:30 a.m. ET, with Verizon posting on X about the issue at 11:48 a.m. ET. More than 100,000 incident reports were submitted just between 11:15 a.m. and 11:30 a.m., bringing the total number of reports to over 400,000 at that point.

Verizon said the issue was resolved after about 10 hours at 7:18 p.m. ET.

A Cloudflare post about the outage said that HTTP traffic fell as much as 9 percent below expected levels during the outage, with places such as Omaha, Neb., seeing traffic down about 30 percent.

Downdetector parent Ookla ranked the September Verizon outage among the largest in the world in 2024 – No. 4 in the U.S. -- with 2.4 million users reporting issues.

October Salesforce Outage

On Oct. 1, customer relationship management (CRM) software giant Salesforce experienced a global service outage with sandbox instances running at 50 percent capacity at 2:40 UTC.

“During the impact period, users may not have been able to access Salesforce services, and a further subset could log in but experienced poor performance,” Salesforce said in a November report on the issue. “Users may have received a ‘We are down for maintenance’ error message during the disruption and performance degradation.”

The company’s report on the outage said that “the full rollout of the emergency release took 14 hours due to the capacity limits on the number of cells that can be upgraded in parallel,” with “manual efforts to suppress restarts and add the missing metadata” mitigating the effects.

Salesforce blamed the disruption on “a missing time-specific configuration” that “prevented Core Application (core app) servers from starting up.”

November Microsoft Outages

Whatever Microsoft was thankful for in 2024, it probably didn’t include the day-plus outage of its Outlook and Teams products just before Thanksgiving, resulting in headlines in national news outlets.

On Nov. 26, CNN said there were more than 5,000 user-reported problems for the issue. Microsoft identified problems at 1:06 a.m. PT on Nov. 25 and reported the issue had been resolved at 12:07 p.m. PT on Nov. 26. Microsoft blamed the outage on “a recent change.”

ThousandEyes said it observed server errors, timeouts and packet loss for Outlook online and other Microsoft products starting at 2:00 UTC Nov. 25.