CrowdStrike Exec: ‘More Oversight’ Now In Place To Prevent Repeat Of Massive IT Outage
In testimony before a U.S. House Homeland Security subcommittee, CrowdStrike’s Adam Meyers said the July outage has prompted a major shift in how the vendor approaches content updates impacting the Windows kernel.
CrowdStrike has overhauled its approach to deploying threat-related content updates that impact the Windows kernel in the wake of the massively disruptive July outage caused by the cybersecurity vendor, a CrowdStrike executive said during congressional testimony Tuesday.
Adam Meyers, senior vice president of Counter Adversary Operations at CrowdStrike, testified before a subcommittee of the U.S. House Homeland Security Committee about the circumstances surrounding the defective update that caused what experts have described as the largest IT outage in history.
[Related: Sophos CEO On How EDR Vendors, Microsoft Are ‘Rethinking’ Security After CrowdStrike Outage]
In response to lawmaker questions focused around preventing the recurrence of the outage, Meyers said the incident has prompted a major shift in how the company tests and rolls out threat-related content updates for its Falcon platform. A faulty Falcon content update on July 19 led to the “blue screen of death” outage, which affected 8.5 million Windows devices and disrupted global air travel, health care and business for several days.
As a result of the outage, “we’re providing a lot more oversight and visibility into what that [update] is and how it goes out to the system,” Meyers told members of the House subcommittee on Cybersecurity and Infrastructure Protection.
Such content updates are deployed regularly — 10 to 12 times every day — in response to evolving threats, he noted.
Prior to the July incident, CrowdStrike treated these content updates differently from software code changes that impact the Windows kernel, he said. Code updates that affect the kernel — the core control center for Microsoft’s Windows operating system — have always gone through a rigorous testing process and phased deployment to prevent “blue screen of death” scenarios, according to CrowdStrike.
However, to prevent a repeat of the widely felt Windows crash in July from configuration changes, “we are now treating the content updates as code,” Meyers said, leading to a series of tests of the content and a staged rollout rather than deploying to all customers simultaneously.
The outage was also enabled by a bug in CrowdStrike’s validation process for security content updates. The bug prevented the company’s Content Validator from catching an erroneous file in the July 19 update, leading to the crash of Windows systems running CrowdStrike’s Falcon platform.
Meyers apologized multiple times for the incident during his testimony, and said CrowdStrike is fully committed to winning back the trust of customers and partners.
While the July incident was the result of a “perfect storm” of issues that coalesced to cause the outage, there’s no question that “we got it wrong in this case,” he said. “We are learning from what happened, and we've implemented changes to ensure that that doesn't happen again.”
Frequency Of Updates
During the hearing, U.S. Rep. Andrew Garbarino, chairman of the Cybersecurity and Infrastructure Protection subcommittee, questioned the cadence of content updates deployed by CrowdStrike. Garbarino cited comments from unnamed competitors of CrowdStrike, who he said have suggested that the frequency of CrowdStrike updates that affect the kernel is unusual for the security industry and ultimately “is not safe.”
CrowdStrike has no plans to slow the pace of content updates that it deploys, according to Meyers.
“We will continue to update our product with threat information as frequently as we need to in order to stay ahead of the threats that we're facing,” he said in response to the question from Garbarino.
“Speed does matter in this domain, in order to stay ahead of these threat actors,” Meyers said. “And the visibility that we get through the kernel, the performance that you gain through using the kernel, the ability to stop bad things, the enforcement mechanism that's provided through that kernel and the anti-tamper to stop a threat actor [are crucial].”