CrowdStrike Says Bug In Validation Process Led To Massive Outage

A bug in CrowdStrike’s content approval system ‘passed validation despite containing problematic content data,’ leading to the outage that caused global disruptions.

A bug in CrowdStrike’s validation process for security configuration updates resulted in the Microsoft Windows outage that led to global disruptions starting Friday, the company said Wednesday.

The widely felt snafu involved what the cybersecurity vendor called its “Content Validator,” which is the system intended to ensure that new configurations meant to secure customers don’t crash IT systems.

[Related: CrowdStrike-Microsoft Outage: 5 Things To Watch For]

In the case of the update that rolled out to CrowdStrike’s Falcon platform just after midnight EDT Friday, the content validation system did not function properly, according to CrowdStrike.

“Due to a bug in the Content Validator, one of the [updates] passed validation despite containing problematic content data,” CrowdStrike said in a “Preliminary Post Incident Review” post on its blog Wednesday.

CrowdStrike did not specify what was responsible for the bug in the content validation system.

“We will be detailing our full investigation in the forthcoming Root Cause Analysis that will be released publicly,” the company said in the post.

CrowdStrike specified that the update that led to the outage involved what’s known as “rapid response content,” which is used as part of performing "behavioral pattern-matching operations” to thwart future cyberattacks.

The defective content in question had been stored within a “proprietary” binary file and was “not code or a kernel driver,” CrowdStrike said.

Going forward, CrowdStrike said that it plans to improve its testing for “rapid response content” deployments.

This will include staggering the deployments for rapid response content, improving monitoring for the performance of sensors and systems and, crucially, providing customers with “greater control over the delivery of Rapid Response Content,” the company said in the post.

In the future, CrowdStrike said it plans to allow for “granular selection of when and where these updates are deployed.”

Routine Update

CrowdStrike also said in the post that the Windows sensor configuration update for Falcon was intended to “gather telemetry on possible novel threat techniques.”

“These updates are a regular part of the dynamic protection mechanisms of the Falcon platform,” the company said.

The cybersecurity giant’s defective configuration update led to the “blue screen of death” for Microsoft Windows systems worldwide on Friday and brought widespread disruptions to air travel, health care, banking and more. At least 8.5 million Windows devices were impacted by CrowdStrike’s update, Microsoft has said.

In the post Wednesday, CrowdStrike said that "problematic content” in a specific file “resulted in an out-of-bounds memory read triggering an exception.”

“This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD),” the company said.

Thousands of flights have been canceled by airlines since the outage began, while some hospitals reported postponing surgeries and some 911 systems were reportedly unavailable on Friday.

The disruptions have continued into this week, notably for beleaguered airline Delta. The airline canceled more than 460 flights Tuesday, according to CNN, for the fifth consecutive day of widespread Delta flight cancellations. That’s on top of more than 5,700 flights canceled by Delta between Friday and Monday, CNN reported. Delta has not responded to CRN requests for comment.

The cybersecurity vendor was ultimately trying to protect customers against increasingly sophisticated hackers when it rolled out the fateful update to Falcon, a solution provider partner told CRN Monday.