The world as we know it has become increasingly reliant on digital connections that largely operate quietly and invisibly in the background, so how did a single software update take down half the internet?
The global IT outage of July 19th is a stark reminder of our vulnerability to technological failures. Caused by a single faulty software update provided by cybersecurity company CrowdStrike, the outage had a devastating impact on airlines, media, banks, and retailers around the world, particularly those that use the Microsoft Windows operating system.
Described as “the largest IT outage in history”, the incident is a reminder of the extensive network of IT interconnections that underpins our digital infrastructure and the far-reaching impacts that can occur when something goes wrong.
What began as delays at airports escalated into widespread flight cancellations. The disruption to the aviation system not only disrupted flight schedules, but also affected global supply chains that depend on air cargo, illustrating the multifaceted nature of the modern IT ecosystem. Meanwhile, many television and radio stations were taken off the air, and supermarkets and banks stopped operating.
Preliminary analysis suggests the disruption appears to have stemmed from a software update to CrowdStrike’s Falcon Sensor security software applied to Microsoft’s Windows operating system. Employees at companies using CrowdStrike encountered a “blue screen of death” when attempting to log in.
The outage not only exposed the web of hidden dependencies that underpin our digital society and economy, but also the geopolitical dimensions of those dependencies: countries with strong ties to Microsoft and CrowdStrike felt the effects the hardest, while companies in countries such as China, where IT infrastructure is relatively isolated and controlled, appeared to be less affected.
One of the industries affected by the blackout was supermarkets. Fascinadora / Shutterstock
Amid rising geopolitical tensions in recent years, an increasing number of countries, including China, have been proactively upgrading their cybersecurity measures and digital infrastructure, which may have mitigated the impact of this incident.
China’s focus on using domestic technology and its reduced reliance on foreign technology may also have contributed to the limited impact on its systems. This incident is a stark reminder that technological reliance can lead to geopolitical vulnerability, and reinforces the need for government officials to consider not only the economic impact of IT partnerships, but also the strategic and geopolitical implications.
Read more: Major IT outage brings businesses worldwide to a halt – experts explain what happened and why
The recovery and its effects
How affected industries responded to this crisis reflects both the strengths and vulnerabilities of their security and disaster recovery strategies. Key issues have been identified and reportedly fixed. The recovery process moving forward will be slow, highlighting the significant challenges involved in restoring continuity of services within complex and deeply interconnected digital ecosystems.
It is particularly surprising that staged software deployments have not been adopted despite many past lessons, such as the TSB IT migration disaster that affected millions of the UK bank’s customers in 2018.
The absence of this fundamental and critical strategic step in IT management has exposed vulnerabilities in systems that many assumed were robust, and raises serious questions about the durability of both Windows operating systems and the CrowdStrike cybersecurity measures that are supposed to protect them.
Additionally, this incident highlighted the strategic risks of relying on a single source of technology. While this global outage demonstrates how important it is to have a diverse technology alliance to strengthen national security and economic stability, it also raised concerns about the potential for adversaries to exploit such vulnerabilities. This incident adds new urgency to international cybersecurity cooperation and policy interventions.
As services begin to stabilize and resume, this outage should serve as a wake-up call for IT professionals, business leaders, and policymakers. It highlights the urgent need to reevaluate and even overhaul existing cybersecurity strategies and IT management practices. Improving the resilience of systems to withstand major disruptions must be a top priority.
This global IT outage serves as a timely wake-up call and a critical juncture for discussions about the future of digital resilience and technology governance at the business, infrastructure and policy levels.
What about AI?
Another thing we don’t yet know the answer to is: if a single software bug can take down airlines, banks, retailers, media, and more around the world, are our systems ready for AI?
Rather than rushing to release chatbots, we may need to invest more in improving software reliability and methodologies. An unregulated AI industry is a recipe for disaster, especially in a world of rising geopolitical tensions.
Embracing emerging technologies like AI and blockchain is important, but so is getting the basics right. Cybersecurity operators need to ensure that basic IT management and maintenance practices are strong, reliable, and able to handle anything from a cybersecurity attack to a simple software update.
Lessons learned from this incident will undoubtedly influence future strategies in IT infrastructure development and crisis management.