CrowdStrike claims that a bug in its testing software was the cause of last week’s massive crash.
The remediation guide, updated on Wednesday, added a preliminary post-incident review (PIR) that provides the vendor’s perspective on how it took down 8.5 million Windows boxes.
The explanation begins by saying that CrowdStrike’s Falcon Sensor comes with “sensor content” that defines its capabilities, and the software is updated with “Rapid Response content” that allows it to detect and gather intelligence on new threats.
Sensor content relies on “template types,” which are codes with predefined fields that threat detection engineers can leverage in their rapid response content.
Rapid Response content is delivered as “template instances,” which CrowdStrike describes as “instantiations of a particular template type.”
Each template instance maps to a specific behavior that the sensor software monitors, detects, or prevents.
In February 2024, CrowdStrike introduced a new “InterProcessCommunication (IPC) template type” that the vendor said was designed to detect “new attack techniques that exploit named pipes.”
The IPC template type passed testing on March 5th, and a template instance was released to use it.
Three more IPC template instances were deployed between April 8 and April 24. All ran without crashing the 8.5 million Windows machines, but Linux machines experienced issues with CrowdStrike in April, as we reported earlier this week.
On July 19, CrowdStrike deployed two more IPC template instances, one of which contained “problematic content data” but was deployed to production anyway due to what CrowdStrike described as a “content validator bug.”
The role of the Content Validator is not discussed in detail in this post, we will assume that it does exactly what the name suggests.
Regardless of what the validators do or are supposed to do, the misfire of the July 19th template instance did not prevent its release. This occurred because CrowdStrike assumed that the passing tests of the IPC template type delivered in March and subsequent related IPC template instances meant that the July 19th release was fine.
History tells us that this was a wildly incorrect assumption: “The result was an out-of-bounds memory read, which raised an exception.”
“This unexpected exception could not be handled properly, causing the Windows operating system to crash.”
On approximately 8.5 million machines.
The incident report includes promises to test future Rapid Response Content more rigorously, phase releases, give users more control over when they’re deployed, and provide release notes.
You read that right: Release notes. Calm your heartbeat. txt.
The report also includes a pledge to publish a full root cause analysis once CrowdStrike has completed its investigation.
Take as much time as you like. Some of us are still busy fixing the machine you broke.®