The 10 Biggest Cloud Outages Of 2011 (So Far)

The Cloud Will Go Down

If we've learned anything about the cloud in the first half of 2011, it's that the cloud can and will go out. There will be outages. The cloud is not immune. It turns out, the cloud computing services and architectures are just as prone to downtime as their on-premise counterparts. And some of the biggest cloud players suffered big-time downtime so far this year.

Here we take a look at the 10 biggest cloud outages of 2011 … so far.

Microsoft Windows Live Hotmail

While Microsoft's Windows Live Hotmail outage started on Dec. 30, 2010, it persisted until Jan. 2, 2011 when Microsoft finally gave the all clear.

Hotmail, the widely popular cloud e-mail service suffered an outage that temporarily deleted user inboxes for more than 17,000 Hotmail users. The outage lasted roughly four days. Users reported that when they logged into their accounts they noticed e-mails, folders and other data had vanished and could not be recovered. While Microsoft said the Hotmail hiccup was fixed by January 2, some users said problems lasted at least two days later.

Microsoft said a load balancing issue knocked out the cloud e-mail service to affected users.

Jive Software

Several hundred Jive Software users' wikis went down in January, a cloud outage prompted by a data center glitch. According to a Jive blog post detailing the outage, an issue at a SunGard-owned data center in Aurora, Colo., where Jive offers hosting services for some of its blogs, wikis and other offerings started the trouble. Jive said the downtime was caused by a hardware failure in a storage system.

At least 500 Jive Software customers had their wikis thrown offline Friday after the provider of cloud-based social media platforms suffered a major outage at a data center, sources said.

Google Gmail

Google's widely popular cloud e-mail service Gmail suffered a massive outage in late February 2011 that wiped out thousands of Gmail inboxes. Gmail users awoke to find messages in their Google Gmail inbox, folders and other data vanished. At its peak, the outage affected roughly 150,000 Gmail users.

In the days that followed, Google apologized for the outage, calling it a "scare." Google said a software bug that was introduced by a storage update had caused the downtime.

Google Gmail was back to full service within a few days.

Intuit

A number of Intuit's hosted services for SMBs were wracked by a string of service outages in late March 2011. The outages, occurred on a Monday, a Tuesday and a Friday, but many users reported issues lasting an entire week. Popular cloud-based Intuit services like QuickBooks Online, QuickBooks Online Payroll and Intuit Payments Solutions conked out during the outages, which were blamed on errors introduced during maintenance operations.

Amazon Web Services

On April 21, Amazon Web Services' cloud offerings suffered sweeping outages and service interruptions for customers using Amazon's North Virginia data center, aka Availability Zone. Service hiccups from Amazon's cloud outage persisted for several days for some customers, angering users. Amazon's lack of communication around the cloud outage prompted calls for transparency.

Amazon said its Elastic Block Store (EBS) service got stuck in a "re-mirroring storm" in its North Virginia data center. The hiccup knocked several Amazon cloud users offline. More than a week after the initial downtime, Amazon apologized for the cloud outage and offered users a cloud credit.

VMware Cloud Foundry

VMware's Cloud Foundry development platform was racked by a pair of different blackouts in the same week, on April 25 and April 26.

While still in beta, the open source Cloud Foundry service was knocked out of commission by a power outage that affected a storage cabinet power supply on April 25 around 5:45 a.m. The following day, around 10:15 a.m., an engineer that was developing an early detection plan to prevent outages like the one the previous day knocked Cloud Foundry offline with an errant keyboard tap, which took out all load balancers, routers and firewalls; caused a partial outage to portions of the internal DNS infrastructure; and resulted in a complete external loss of connectivity to Cloud Foundry.

Yahoo Mail

Yahoo Mail, the search company's massive cloud-based e-mail service, went down on April 28. Yahoo could not say how many users were impacted when its popular e-mail service was down for several hours, but Yahoo estimated that more than 1 million of Yahoo Mail's more than 250 million users.

Yahoo didn't say what caused Yahoo Mail to go dark for that several-hour stretch, but said no e-mail data was lost or at risk during the disruption.

Microsoft BPOS: Round 1

Between May 10 and May 13, Microsoft Business Productivity Online Service (BPOS) suffered a string of cloud outages that caused lengthy cloud e-mail delays for BPOS users.

Trouble started around 12:30 p.m. on Tuesday, May 10, when the BPOS-S Exchange service experienced an issue with one of the hub components due to malformed e-mail traffic on the service. Microsoft said Exchange features a built-in capability to handle malformed traffic but "encountered an obscure case" where that also didn't work correctly, creating a backlog of e-mail. The issue caused delays of six to nine hours.

Then, on May 13, more issues caused e-mail delays, resulting in more than 1.5 million e-mail messages getting stuck and awaiting delivery. Microsoft fixed that issue by 3:04 p.m. and all e-mails were cleared within a few hours.

Microsoft BPOS: Round 2

On May 19, Microsoft's Exchange Online cloud e-mail service, part of BPOS, suffered a software problem that caused intermittent e-mail delays for customers in the Americas. Microsoft said less than one percent of customers were affected by the e-mail delays, which began at 8:48 a.m. when monitoring systems detected abnormally large email queues in 30 percent of Exchange Online hub servers. By 9:54 a.m., e-mail queues had fallen to normal levels on all but one hub server, and at 11:21 a.m., Microsoft's BPOS, Exchange and Forefront Engineering teams identified the software problem causing the issue. Microsoft fixed the software problem by adding a single new hub server that relieved the backlog and restored the free flow of email by 3:33 p.m.

Microsoft BPOS: Round 3

Microsoft BPOS suffered its fourth outage in just over a month on June 22. The service was knocked offline for more than two hours and also took with it the Online Services Health Dashboard, meaning users had no place to look to see the problem.

Throughout the outage, Microsoft kept affected BPOS customers up to speed via social networks like Twitter and Facebook.