CrowdStrike: More Testing, Staged Rollouts Now In Place For Updates
The cybersecurity vendor released its technical root cause analysis for the faulty July 19 update that caused a massive Windows outage.
CrowdStrike has released further analysis of the faulty July 19 update that caused a massive global IT outage, detailing how testing and staged rollouts of updates should help to prevent such issues in the future.
The cybersecurity vendor on Tuesday released its technical root cause analysis for the outage, which led to a “blue screen of death” for 8.5 million Microsoft Windows devices and had widely felt societal impacts for several days.
[Related: CrowdStrike Pushes Back Against Delta Legal Threats Over Outage]
In addition to disclosing technical details about the “out-of-bounds memory read” issue that caused the system crash, CrowdStrike pinpointed several measures that have been implemented in response to the outage.
The cybersecurity vendor singled out additional testing and staggered rollouts of updates to its Falcon platform. CRN has reached out to CrowdStrike for further comment.
In terms of testing, CrowdStrike said in the root cause analysis that its content configuration system has been “updated with new test procedures.” The move has come after it was found that the testing system on July 19 did not catch the issue—a “mismatched number of inputs” in the update—that caused the outage.
The implementation will ultimately provide for “additional testing prior to production deployment” for this type of update, CrowdStrike said.
Meanwhile, CrowdStrike’s probe into the cause of the outage also found that such updates “should be deployed in a staged rollout.” As a result, the vendor said that its content configuration system “has been updated with additional deployment layers and acceptance checks.”
Staged deployment, according to CrowdStrike, mitigates the impact if a new update ends up causing failures.
Updates that have passed initial testing “are to be successively promoted to wider deployment rings or rolled back if problems are detected,” the company said. “Each ring is designed to identify and mitigate potential issues before wider deployment.”
In comments posted Tuesday on the vendor’s Remediation and Guidance Hub page, CrowdStrike CEO George Kurtz thanked partners and customers who “mobilized immediately to restore systems” after the outage. “We could not have accomplished so much, so quickly, without your collaboration,” Kurtz said.
The CrowdStrike CEO noted that approximately 99 percent of Windows sensors for Falcon were online as of Thursday — adding that “to our customers that are still affected, please know that we will not rest until all systems are restored.”
Overall, “we are deeply sorry for the impact this had on you. Nothing is more important than regaining your trust and confidence,” he said.