The 10 Biggest Cloud Outages Of 2021 (So Far)

‘Outages can mean the end for companies, depending on their choices in design and deployment, or they can be complete non-events,’ Miles Ward, chief technology officer at Los Angeles-based Google partner SADA Systems, tells CRN. ‘Cloud has changed the nature of outages.’

See the latest entry: The 10 Biggest Cloud Outages of 2022 (So Far)

Verizon, Microsoft and Google were just some cloud providers to see their services interrupted so far this year from a variety of issues, from a change in the authentication system to a deadly winter storm. In the cloud computing era, some experts say we can only expect more outages — but with less severity.

Miles Ward, chief technology officer at Los Angeles-based Google partner SADA Systems, told CRN that cloud outages can prove less disastrous than when data centers have issues. With cloud-related issues, providers can fix the problem in parallel with a user’s team, whereas data centers can require an internal team to fix problems.

“Outages can mean the end for companies, depending on their choices in design and deployment, or they can be complete non-events,” Ward said. “Cloud has changed the nature of outages.”

As cloud adoption and the number of regions, zones and cloud services grow, everyone should prepare for more outages, Ward said. But he expects the type of global, all-service outages that garner headlines to decrease.

“Every cloud engineering team has seen how impossible it is for customers to engineer around these kinds of outages and is working hard to distribute, subdivide, and make fault-tolerant these central services,” Ward said. “The result may be a shift of focus where you might see even more minor failures in singleton services, while the global services survive seemingly unaffected by minor failures because of this investment in resilience.”

Companies today need copies of their data in distant regions, to run instances in multiple zones and automation to cut down on the time it takes to fix an outage, Ward said. At SADA, even demos are designed with high availability to run across Google Cloud and AWS.

In the meantime, CRN has collected a list of some of the largest cloud outages and issues to hit computers this year. Here’s what you need to know.

For more of the biggest startups, products and news stories of 2021 so far, click here.

January Verizon Outage

Less than a month into 2021, a massive Verizon Fios outage that affected thousands of customers in the northeastern U.S. — at the same time that services such as Google and Zoom appeared to suffer issues as well.

Initial reports blamed a fiber cut in Brooklyn, but Verizon later confirmed to Business Insider that the disruption came from “a software issue triggered during routine network management activities.“

Verizon customers across Boston, New York, Philadelphia, Baltimore, and Washington D.C. saw their internet services slowed or stopped completely. According to DownDetector, which tracks internet outages, there were more than 22,000 reports of outages at around 12:14 pm ET. That number dropped to more than 3,200 at 1:30 pm ET.

February Microsoft Teams Issues

February saw two separate issues with the Microsoft Teams collaboration app. First, on Feb. 4, an issue prevented some North American users from joining meetings. By the afternoon that same day, Microsoft issued a statement to say “we resolved the short interruption that a subset of customers in North America may have experienced connecting to meetings or live events.”

An outage map on DownDetector.com showed the Teams issues affecting users in numerous major cities in the U.S. and Canada, including New York, Washington, Chicago, Toronto, San Francisco, Los Angeles, Dallas, Phoenix, Atlanta, Seattle and Boston.

On Feb. 17, Teams was hit by a possible networking issue that led to delays in receiving chat messages for some North American users. Microsoft also reported issues with running live events in Teams. Along with North America, some users in South America were affected. The issue with delayed chat messages was resolved after roughly five hours.

“We rerouted services to alternate infrastructure, and the impact to message delivery has been mitigated,” the company said in a tweet from the Microsoft 365 Status account at 4:07 p.m. ET.

For solution providers and their customers, “the infrequent but annoying feature outages of Teams are undoubtedly frustrating,” Ben Wilcox, senior vice president of solution architecture at iV4, a unit of Atlanta-based ProArch, told CRN at the time.

February Texas Winter Storm

Mother nature rather than any man-made technical hitch wreaked havoc with Texas business’ IT systems in February thanks to a string of unscheduled power blackouts caused by a winter storm.

By early morning Feb. 17, 2.7 million households were without power, according to The Electric Reliability Council of Texas, or ERCOT, which manages the flow of electric power to over 26 million Texas customers, or about 90 percent of the state’s electric load.

The unusually cold weather blanked most of Texas with layers of snow and claimed the lives of more than 100 people and caused power outages affecting millions of homes and businesses.

Texas solution providers told CRN at the time that power, internet services and water services came and went sporadically.

Another Verizon Outage

Days after the CompuCom outage, Verizon experienced an internet outage a little after 8 a.m. ET on March 3. The Verizon outage, according to Downdetector.com, impacted parts of Washington, D.C., and several states, including Maryland, Massachusetts, New York, Pennsylvania, and Virginia. The issue was fixed within the day.

Verizon users across the Northeast and Mid-Atlantic states took to Twitter to report moderate to critical service disruptions, with many users complaining about spotty connections or complete outages, especially as it related to VPN connectivity.

At 9:53 a.m. ET, Downdetector.com received nearly 5,000 reports of service disruptions. By around noon, the website reported 467 reports of service issues related to Verizon.

A little after 10 a.m. ET, Harvard University Information Technology posted to its service status page that it was monitoring an incident impacting Verizon network services. “Users may experience network or VPN connection issues due to a Verizon global issue affecting” the East Coast, the update said.

Another Microsoft Outage

Microsoft in March reported a global outage affecting the Teams collaboration app, as well as “multiple” other Azure, Office 365 and Dynamics 365 services. Microsoft blamed the Teams and Azure outage on “an issue with a recent change to an authentication system.”

Microsoft reported that the widespread outage was largely resolved after about four hours. A map on Downdetector showed the Teams outage affecting cities including New York, Washington, Chicago, Toronto, San Francisco, Los Angeles and Seattle.

At 5:57 p.m. ET March 15, Microsoft tweeted that “the update has finished deployment to all impacted regions. Microsoft 365 services are showing decreasing error rates in telemetry.”

An IT director who was grappling with the outage at the time told CRN, “I can’t get into the Microsoft console to even see what is going on.”

“Anytime you see any vendors spike like this on Downdetector it is bad,” said the IT director, who did not want to be identified. “But until you understand what the cause is, you don’t know if it is going to take them five minutes or five hours to fix.”

Microsoft Outages In April

Three weeks after the March Microsoft outage, the Redmond, Wash.-based tech giant reported that Domain Name System (DNS) issues led to an outage that affected cloud services including Azure, Teams and Dynamics 365 in April. The problem was fully mitigated after about five hours.

Microsoft said the problems stemmed from an unexpected increase in DNS traffic. DNS provides the directory that’s used to match domain names with their associated IP addresses.

The peak of the issues occurred between about 5:30 p.m. and 6:30 p.m. ET on April 1, and the problem was fully mitigated as of about 10:30 p.m., Microsoft said.

A global outage on April 27 impacted users of the Microsoft Teams videoconferencing and collaboration platform. Microsoft fixed the issue the same day by 9:03 a.m. ET.

Microsoft confirmed the global outage in a 6:53 a.m. post: “We’ve confirmed that this issue affects users globally. We’re reviewing monitoring telemetry and recent changes to isolate the source of the issue.”

Google Outage In April

April also saw a partial outage for Google Drive and cloud-based apps such as Google Docs for about three hours, leading to high latency and other issues for some users.

The Google Drive cloud storage service—and associated cloud apps including Google Docs and Google Sheets—suffered multiple service issues during the partial outage on April 12. Other Google services were not affected, including Gmail, Google Calendar and Google Meet.

While users could still access Google Drive, affected users could not create new documents and were “seeing error messages, high latency, and/or other unexpected behavior,” according to the company.

“We apologize for the inconvenience and thank you for your patience and continued support,” the company said on its Google Drive service details page.

Fastly Outage In June

On June 8, a service configuration issue at content delivery cloud provider Fastly impacted bulletin board website Reddit, video streaming service Twitch and a number of news sites including CNN and The New York Times.

Fastly confirmed in a Twitter post on June 8 that a service configuration “triggered disruptions” across its network globally. Fastly said it had moved to disable that configuration.

“Our global network is coming back online,” the company said in a Twitter post at about 7:15 a.m.

Downdetector.com showed huge spikes in complaints about outages at Hulu, Amazon and others.

Michael Goldstein, CEO of LAN Infotech, a Fort Lauderdale, Fla.-based solution provider, told CRN at the time that the global outage shows how critical it is for customers to properly architect their cloud and on-premises network.

“Cloud isn’t any different than on-premises—with both cloud and on-premises you need to make sure you have the right architecture,” Goldstein said. “We make sure that when we put mission-critical applications in [Microsoft] Azure for our customers we have multiple data center regions to prevent an outage like this. You need a fail-safe and a continuity plan to prevent outages. A lot of it is dependent on how much the client is willing to pay for continuity services.”

More Microsoft Issues

June also saw issues for tech giant Microsoft. Teams’ calling service sent calls straight into some users’ voicemails.

At about 3 p.m. ET on June 11, the Microsoft 365 Status Twitter account disclosed that Microsoft was investigating reports of an issue that was sending incoming calls “straight to voicemail” in Teams.

Subsequently, the account tweeted that Microsoft “isolated a recent change that has caused portions of infrastructure to send some Microsoft Teams calls straight to voicemail.”

Rosalyn Arntzen, president and CEO of Redmond, Wash.-based Amaxra, a Microsoft Gold partner, told CRN at the time that, over the past few years, Microsoft had gotten “dramatically better” at updating partners “as soon as they are aware of an issue and listing when they expect the issue to be solved—or at least provide a status.”

Akamai Outage, June 17

Nine days after the Fastly outage, a system issue with Cambridge, Mass.-based Akamai Technologies caused internet outages for global airlines, banks, and stock exchanges. The company saw service disruptions for its hosting platform, which helps defend against Distributed Denial-of-Service (DDoS) attacks.

The disruption affected several large companies around the globe, including Southwest Airlines, United Airlines, Commonwealth Bank of Australia, Westpac Bank, and Australia and New Zealand Banking Group, as well as the Hong Kong Stock Exchange’s website. Services for many of the companies impacted were restored within the day.

Downdetector.com showed spikes in complaints about service outages for websites of companies inside the U.S. as well as in a number of other countries including Australia, Germany and India.