CrowdStrike Pins Massive Microsoft Outage On ‘Logic Error’ In Falcon Update
It remains unclear exactly why the hugely disruptive error occurred, the company says—especially given that such updates are very common.
CrowdStrike said the unprecedented Microsoft outage felt worldwide stemmed from a programming error that was triggered as part of a common update process—prompting further questions about how the massively disruptive error could have occurred.
The defective CrowdStrike Falcon update led to the meltdown of potentially millions of Microsoft Windows systems worldwide on Friday and hobbled much of what the modern world depends on, from air travel to health care to banking and beyond. Experts have called it the largest IT outage of all time.
[Related: MSPs Rally To Help Amid CrowdStrike-Microsoft Outage: ‘That’s Why We’re Here’]
In a blog post releasing technical details late Friday, CrowdStrike identified a “logic error” as the culprit in the Microsoft outage. The programming error was triggered by a sensor configuration update to Falcon, which is a frequent type of update.
Such updates “are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike,” the company said in the post. “This is not a new process; the architecture has been in place since Falcon’s inception.”
The sensor configuration update that ultimately triggered the logic error was released to Windows systems shortly after midnight, EDT, on Friday, the company said in the post.
“Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform,” CrowdStrike said.
For a still-unknown reason, “this configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems,” the company said.
“We are doing a thorough root cause analysis to determine how this logic flaw occurred,” CrowdStrike said.
In the IT world, logic errors are well-known for causing “infinite loops” that cause the continual consumption of CPU resources and lead to a system crash, also known as a “blue screen of death.”
Stopping The Loop
Danny Jenkins, CEO of cybersecurity vendor ThreatLocker, told CRN Friday that his company helped respond to endpoint devices that were stuck in an infinite loop as a result of the defective CrowdStrike update. The roughly 22,000 devices were using endpoint security software from both ThreatLocker and CrowdStrike, Jenkins noted.
ThreatLocker’s team began exploring whether blocking the files that had been changed by CrowdStrike would stop the devices from looping, he said, and that largely solved the looping issue.
“The big challenge for MSPs is, they cannot get in front of the machines—because they have to physically go and touch the keyboard to boot in safe mode and delete the files,” Jenkins said, noting that CrowdStrike’s remedy is to delete the problematic files from the sensor update.
“So what we've said was, ‘How do we make it so they don't have to delete the files in front of the machine?’” Jenkins said.
Recovery Underway
Speaking with NBC’s Today show Friday, CrowdStrike CEO George Kurtz said that the issue has been fixed but “it could be some time” before a full recovery is possible.
The defective CrowdStrike software update led to impacts including thousands of flights canceled, health-care services such as surgeries curtailed and 911 system outages.
“We’re deeply sorry for the impact that we’ve caused to customers, to travelers, to anyone affected by this,” Kurtz said.
More Technical Details
In its blog post late Friday, CrowdStrike also noted that the impacted files “are not kernel drivers” as some reports had suggested.
The sensor update that triggered the issue “was designed to target newly observed” and “malicious” communications infrastructure, the company said.
This infrastructure, or “malicious named pipes,” has been observed to be used by common command-and-control frameworks as part of cyberattacks, CrowdStrike said.