The 10 Biggest Cloud Outages Of 2019 (So Far)
Here are the 10 cloud outages so far this year that have generated the most public backlash and, accordingly, did the most to deteriorate trust in the cloud services model for IT.
The Outage Outlook
See the latest entry: The 10 Biggest Cloud Outages of 2022 (So Far)
Any prolonged cloud outage is a public relations nightmare not only for the company that experienced it, but for the industry as a whole.
As the cloud matures, service disruptions are getting shorter in duration and less frequent. Still, failures -are inevitable, and when they occur, the angriest of users, be they enterprise customers reliant on mission-critical apps or consumers of social networks, will surely make their voices heard.
Here are the 10 outages so far this year that have generated the most public backlash and, accordingly, did the most to deteriorate public trust in the whole cloud services model for IT.
Microsoft Office: Jan 24 and Jan 29
Two outages in the same week wasn't how Microsoft wanted to usher in the new year.
On Jan. 24, the Software-as-a-Service giant reported customers in Europe couldn't access their Office 365 Exchange Online mailboxes that morning.
"We've determined that a subset of Domain Controller infrastructure is unresponsive, resulting in user connection time outs," Microsoft tweeted.
Less than a week later, on January 29, Office 365 struggled with a problem that affected more services and customers across a wider geography.
That outage started somewhat inconspicuously, with Skype for Business dropping calls and reported problems using Microsoft Teams. It then got wider attention as it cascaded to more Office 365, Azure and Dynamics services, with users often unable to log-in to accounts using Azure Active Directory.
Microsoft later blamed an outage at DNS provider CenturyLink, which acknowledged a software defect affecting connectivity to customers' cloud resources.
Google Gmail and Drive: March 12
Major problems with Gmail and Google Drive were first reported just before 8 pm PT on the evening of March 12, followed by glitches affecting YouTube.
On its status page, Google said users were seeing "error messages, high latency, and/or other unexpected behavior."
Gmail users complained of problems sending emails. Google Drive users reported that certain files weren't opening, and that performance of the cloud storage solution was degraded. The outages lasted for roughly three-and-a-half hours.
Facebook, Instagram: March 13
Facebook and its photo-sharing subsidiary, Instagram, both suffered partial service outages the morning of March 13. The outages not only impacted consumers, but also developers building apps on the world's largest social network.
A Facebook engineer on the company’s server status page initially wrote the company was "experiencing issues that may cause some API requests to take longer or fail unexpectedly."
It took almost a full day before error rates returned to normal.
Facebook quickly shot down the suggestion of a DDoS attack at root. Instead, the company blamed a "server configuration change."
Microsoft Azure: May 2
Several core Microsoft cloud services, including compute, storage, an application development platform, Active Directory and SQL database services, were impacted by a nearly three-hour DNS outage on May 2.
Some of Microsoft's cloud-based applications, including Microsoft 365, Dynamics and Azure DevOps, were also impacted.
According to Microsoft's Azure status page, the underlying root cause was a nameserver change that affected DNS resolution, harming the downstream services.
"During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services," the page said.
ConnectWise: May 5
Many in the channel rely on ConnectWise to maintain their relationships and service agreements with customers. So an outage on May 3 of the ConnectWise Manage platform in Europe struck solution providers on that continent particularly hard.
ConnectWise later revealed a ransomware attack forced it to go offline in the EU. The attack did not compromise any personal data, the company said in a letter to partners.
The attack came through an off-site machine that ConnectWise used for cloud-performance testing outside its network. The company said it hired a forensics firm to investigate the attack and has taken steps to make sure it cannot be duplicated.
ConnectWise was forced to restore systems from backups, resulting in some data loss. It issued a credit note to customers affected by the outage.
Salesforce: May 20
Salesforce grappled with the worst service disruption in its history for several days in late May—one that caused sales agents and marketers around the world to lose access to customer information.
The outage was caused by a faulty database script for Pardot marketing automation software. Pardot customers reported all their users could see and edit the entirety of the company's data on the system.
To stop the permissions failure from exposing sensitive information, Salesforce cut access to the larger Salesforce Marketing Cloud, then struggled for three days to restore it to all its customers.
Google Cloud: June 2
Cascading errors created a network congestion problem on June 2 that brought down many Google Cloud services for roughly four hours, in addition to large GCP customers like Snapchat and Shopify.
Google Cloud, G Suite apps and YouTube all had problems during what Google categorized as "a major outage, both in scope and duration."
The problem was caused by a software bug combined with two misconfigurations during maintenance procedures that would, in themselves, have been benign, Google said.
Google Calendar And Hangouts Meet: June 19
Service disruptions hit Google Calendar and Google Hangouts Meet the morning of June 19, dealing another blow to consumers and G Suite users who had already dealt with multiple Google Cloud issues in the previous months.
The Mountain View, Calif.-based internet giant acknowledged the two outages in mid-morning messages on the G Suite status page.
Days earlier, Gmail spam filters stopped working for an hour, and at the beginning of the month, a widespread outage impacted multiple Google Cloud services.
Cloudflare, AWS, Verizon: June 24
Cloudflare, which provides content delivery network services for millions of websites, including major cloud providers, experienced a massive outage that brought much of the Internet to its knees for a couple hours on June 24.
Among the victims of the outage was cloud behemoth Amazon Web Services, which reported Internet connectivity problems affecting customers resulting from an "external provider".
Cloudflare shifted the blame for the "small heart attack" had by the Internet to Verizon.
The telecom giant, one of the major Internet transit providers, created a major Border Gateway Patrol protocol routing leak, partly thanks to a "BGP Optimizer" product from a company called Noction.
The result was "the equivalent of Waze routing an entire freeway down a neighborhood street."
Slack: June 28
"Something's not quite right" with Slack, the collaboration service's status page reported on June 28.
The extremely popular platform relied on for internal communications by tens of millions of office workers started registering problems around the globe before 5 a.m. PT. Customers reported "multiple issues regarding Slack's degraded performance" across all its popular services—login, messaging, posting files, calls and integrations with other apps through APIs.
According to Slack, some of its servers became unavailable that Friday, resulting in "degraded performance" for job processing for around seven hours. The company registered a 10-25 percent job error or failure rate during that time.