Last week’s IT chaos was caused by a bug in an anti-hacking software package, leading some to think only Windows-based PCs were suffering from Blue Screen of Death (BSODs), but it turns out CrowdStrike’s Falcon program is also causing the same thing to happen to Linux systems, taking down client and server machines.
The fact that this isn’t just a Windows issue is certainly noteworthy, given the number of happy posts floating around on Friday from many people using Linux systems.
The news that CrowdStrike’s troubles are not limited to Windows installations confirms what was already suspected in the IT outage reported by the Register and that spread across the globe last week: it wasn’t a Windows issue, but a completely different piece of software. The application in question, CrowdStrike Falcon, is essentially an anti-hacking and anti-malware package used by businesses large and small, as well as government agencies and services.
A buggy update to the program caused Windows PCs to experience stop errors (known as BSODs, or Blue Screens of Death), which occurred repeatedly every time they tried to boot up. Microsoft responded quickly, creating a recovery tool to help affected computers resolve the issue. CrowdStrike CEO George Kurtz apologized profusely for the entire incident.
But behind the endless headlines showing BSOD photos is the less-publicized fact that Linux systems have also been affected by the Falcon bug, albeit in cases dating back a month before last week’s issue: RedHat has identified CrowdStrike software as the cause of kernel panics (the Linux equivalent of a Windows stop error), and the Register notes that previous Falcon updates have caused similar issues in Debian and RockyLinux.
Software bugs are so common that anyone who uses computers accepts them as part of the modern IT world. But there’s a big difference between a few glitches in an application and an operating system kernel that stops working, and that difference is even more significant given the widespread use of CrowdStrike’s software.
While I have never been in a position where I had to manage a large computer network providing mission-critical services, I have managed several smaller computer networks in the days when the stability of Windows and its updates was highly volatile, where we pushed updates to only one test machine and left the rest of the network on the previously tested update to ensure that a change would not make the entire system unavailable.
I believe this is common practice, but given the magnitude of the impact of Friday’s Falcon update, it may not be as common as I think. I’m not saying that this issue is partly the fault of IT system administrators (the blame is certainly being pointed at CrowdStrike), but I can’t help but feel that if you’re managing systems that shouldn’t go down for any reason, you definitely shouldn’t be rolling out updates without testing them first.
It remains to be seen whether CrowdStrike’s outage will be the worst in history, but we do know a few things for sure: CrowdStrike’s market value will drop significantly, and IT managers will be extremely wary of the company’s software going forward.