The 10 Biggest Cloud Outages Of 2020 (So Far)

Cloud providers for the most part have met the tremendous surge in demand caused by the coronavirus crisis, with a few notable exceptions. Here are the 10 biggest cloud outages that have impacted users so far this year.

No Time For Downtime

See the latest entry: The 10 Biggest Cloud Outages of 2022 (So Far)

The coronavirus crisis has tested cloud providers in ways none of them could have predicted just a few months earlier. For the most part, the industry has met the moment—maintaining availability and stability amid a sudden and tremendous surge in demand from a global population plunged into far-greater reliance on cloud services to continue working, learning and entertaining.

While the public cloud has proven incredibly resilient in the face of an unprecedented stress test, there have been notable exceptions to that rule in the first half of 2020. But only a few of the largest cloud outages so far this year directly stem from a coronavirus-related surge in usage—the rest resulted from the kind of common mishaps and failures that to some degree seem inevitable, even when they’re infuriating.

Here are the 10 biggest cloud outages that have impacted users so far this year.

Twitter, Feb. 7

Twitter suffered a partial outage that affected some users’ ability to send tweets in February.

“Tweeting is broken. We're working on fixing it,” tweeted Patrick Traughber, product manager for the social media giant.

According to Downdetector.com, there was a huge spike to 12,000 complaints of service disruptions by 5 pm ET, mostly in the U.S. and Europe.

“Sorry for the interruption and we’ll let you know when things are back to normal,” Twitter’s support team tweeted.

Twitter soon realized a recent update containing bad code was the culprit and undid the change. By 5:07 pm ET, the support account notified distraught social media users: “you can get back to Tweeting –– this problem has been fixed! Thanks for sticking with us through that.”

Microsoft Azure, March 3

A six-hour outage, starting at 9:30 am ET, struck the U.S. East data center for Microsoft’s Azure cloud, limiting the availability of Azure cloud services for some North American customers.

A few days later Microsoft disclosed that a cooling system failure was to blame. Malfunctioning building automation controls caused a reduction in airflow, and the subsequent temperature spikes throughout the data center hampered performance of network devices, rendering compute and storage instances inaccessible.
Microsoft ultimately reset the cooling system controllers, and once the temperature fell, engineers power-cycled hardware to resume services.

Microsoft Teams, March 16

Microsoft Teams suffered an outage lasting two hours in Europe as a surge of new users turned to the collaboration platform amid the onset of the coronavirus crisis, stressing its capacity.

Microsoft tweeted it was "investigating messaging-related functionality problems within Microsoft Teams" as of 4:50 a.m. ET. Several reports indicated the suite of applications had gone down completely for European users.

In a statement provided to CRN, Microsoft said, “We’ve taken steps to address an issue that a subset of our customers may have experienced. Our engineering teams continue to actively monitor performance and usage trends.”

Two weeks prior, Microsoft pledged to offer a free Office 365 E1 subscription for six months to businesses and educational institutions that weren't currently licensed for Teams.

Microsoft Azure, March 24-26

Microsoft confirmed a series of March outages impacting European customers were caused by strains placed on several cloud services by the COVID-19 pandemic.

Developers were uniquely impacted, as the first casualty on March 24 was Azure Pipelines, a continuous delivery services used by DevOps teams. For the next few days, software development pipelines experienced significant delays.

“This incident was caused by VM capacity constraints arising from the global health pandemic that led to increased machine reimage times and then increased wait times for available agents,” Microsoft later explained.

By the end of the week, Microsoft accepted blame for not promptly addressing the failure.

“On the first day, when the impact was most severe, we didn’t acknowledge the incident for approximately five hours, which is substantially worse than our target of 10 minutes,” Engineering Director Chad Kimes said.

Google Cloud Platform, March 26

Google users started reporting problems accessing several cloud services just after 11 a.m. on March 26.

Many tweeted they encountered Google’s 500 and 502 error codes—the 500 code relates to requests that fail due to an internal error; the 502 code denotes a bad gateway error.

Google ultimately described the outage as having to do with its “infrastructure components.”

Google customers on the Eastern seaboard seemed most impacted, according to Downdetector, which offers real-time status and outage information for service providers.

Zoom, April 3

Zoom Video Communications became one of the world’s most-essential service providers as the COVID-19 pandemic forced remote working and distance learning across the globe, resulting in a staggering surge in demand for the platform.

That stress seems to have caused an April 3 outage that jolted Zoom users on the East Coast of the U.S. and parts of Europe who relied on the platform as part of their “new normal” work environment. To a lesser extent, the outage was felt in parts of California, Florida and the Midwest, as well as Malaysia, according to DownDetector.com.

Error messages reported upon login attempts suggested a problem with the Zoom web client, which Zoom’s status page said was under maintenance.

"During these tough times, we are seeing a massive increase in demand for our services. To continue serving our incredible services to our customers and developers, we may be making changes rapidly," the company wrote on its Developer Forum page.

The company had been offering free video options for education organizations and promoting free 40-minute video meetings for businesses and consumers. That contributed to a 151 percent year-over-year spike in March in daily active users.

Google Cloud Platform, April 8

A failure involving Google Cloud’s Identity and Access Management (IAM) API locked users out of their Gmail accounts and disrupted other popular services built on Google Cloud.

The IAM issues, which started at 10:35 a.m. ET and lasted just under 90 minutes, tripped up multiple Google services, including App Engine, Cloud Functions, BigQuery and its core Compute Engine IaaS.

Nest, a Google sister company, had to explain to customers why their security cameras temporarily failed to record footage. And Snapchat, a prominent Google Cloud customer, was completely down for more than an hour.

GitHub, April 21

GitHub, the source code repository owned by Microsoft, saw multiple outages near the end of April.

GitHub services first struggled for more than an hour on April 21. The next day, there were two back-to-back outages again stalling the work of developers who rely on the platform, and then another affecting multiple GitHub services for more than an hour the following day as well.

Git Operations, API requests, pull requests and other functionality that software engineers rely on as part of their day-to-day work were degraded. Developers went to Twitter to criticize Microsoft for a lack of transparency as the rolling outages continued through the week.

Adobe Creative Cloud, May 28

Creative professionals essentially took a day off when Adobe’s cloud platform, comprised of popular titles like Photoshop, InDesign and Premier Pro, went down for roughly an entire business day.

The Adobe Creative Cloud outage stirred up talk in the digital design community about the downside of cloud-based services, with many wondering on Twitter whether they preferred the on-premises implementations of those products.

Starting around 9 a.m. ET, Adobe customers reported they couldn’t log in to the platform and access their projects. Some said they couldn’t even contact support because they were locked out of their accounts.

More than seven hours after problems first came to light, Adobe tweeted they were resolved, but didn’t elaborate on the root cause.

IBM Cloud, June 9

IBM blamed a third-party networking failure for a serious cloud outage that brought many Big Blue customers, including some popular websites, to a sudden halt.

The CEO of one IBM Business Partner told CRN customers across the U.S. lost access to their environments, their status screens and consoles, and they had “no sense of what was happening.”

“It affected everything,” he said. “The whole environment was down.”

The IBM Cloud status page, which also was briefly down during the Tuesday disruption, reported a slew of issues that were resolved after 6:30 p.m. ET.

“The network operations team adjusted routing policies to fix an issue introduced by a 3rd party provider and this resolved the incident,” the IBM status page explained.