In case you were living under a rock, last week, a large portion of Windows computers just… stopped working. As I am sure all of you know, it was due to a system update bug from the security company CrowdStrike. While the bug itself was patched rather quickly, the lasting effects crashed the stock market for quite a while and brought our global economy to a brief, but still incredibly damaging, halt. Nearly everyone knows this, but behind it lies a story of corporate battles, government restrictions, the passing of blame, and a single question: how can we avoid a similar disaster in the future?
Individuals waiting in line as the CrowdStrike shut down their planned flights. This actually happened to my uncle and aunt.
A Bit of Backstory
CrowdStrike is no stranger to the security industry. Founded in 2011, the company has been central to several crucial cybersecurity incidents over the past decade. Just three years after its beginning, it helped the United States government uncover and stop the activities of Russian and Chinese hacker groups and helped uncover North Korea's ties to the Sony hacks in 2014.
Their most prominent moment (prior to last week, of course), however, was their investigation into the 2016 Democratic National Convention hack. They were the company that made the now-famous announcement accusing Russia of involvement in the DNC hacks. This eventually led to then-President Donald Trump suggesting to Ukrainian President Zelenskyy to investigate CrowdStrike and then-presidential candidate (and current President) Joe Biden's son, Hunter Biden, in the infamous call that led to his first impeachment.
All of this is to say that CrowdStrike was quite prominent even before the strike, eventually becoming the lead security software provider for most operating systems, including Windows, Linux, and MacOS, used by nearly all major tech companies.
The Crash
On the 19th of July, 2024, CrowdStrike released a new software update for their Falcon security platform, one for each operating system they were utilized for. Unfortunately, their systems update had a crucial fault that would cripple Windows systems that used it. It is not that there were no warning signs, with several updates earlier in the year crashing specific Linux application kernels (kernels are effectively the translator between computer codes and the actual inner workings of the OS, in non-computer science speak). However, since these crashes hurt only specific Linux communities, CrowdStrike seems to have not paid much attention to them.
For the July crash, however, a dangerous bug directly caused the Windows kernel to shut down. As per CrowdStrike's preliminary reports (their official findings have yet to be unveiled), this was caused by a simple bug in their rapid response security measure. The bug caused a logic error to channel file C-00000291.sys, causing Windows systems using the Falcon platform to crash into the blue screen of death.
Why did this bug not affect Macs? It dates back to EU regulations that demanded Microsoft make their Windows OS kernel accessible for public use to avoid Microsoft monopolizing Windows software.
The crash instantly paralyzed the world. Since Windows is the world's most used OS system, nearly every company that used Windows (with CrowdStrike's security system) was completely shut down. In particular, Airlines and banking systems were crippled, leading to our fragile, interconnected, global economic and cultural system witnessing a minor incidence of systems collapse.
As CrowdStrike worked on finding a fix, a simple solution was found: simply entering a PC in safe mode and deleting the corrupted file. Unfortunately, the largest companies, those most affected by the bug, do not allow employees to enter safe mode unless required. The end result is that IT departments (already shorthanded in several companies due to layoffs) faced a surge of pressure that was incredibly difficult to manage.
Eventually, most companies came online, but the ensuing scare ensured that only slightly positive shareholder earnings announced by major companies soon afterward were enough to keep the economy in a tailspin. The estimated loss in productivity was nearly a billion dollars, not to mention many individuals who were delayed from meeting loved ones due to delayed flights, and customer service struggled in the aftermath of the crash. This was a mess; what can we learn from it?
Lessons From a Multi-Part Disaster:
Unlike my Watcher article, which only taught us lessons from a PR perspective, you can learn some practical advice regardless of your career from the CrowdStrike debacle.
The basics always apply: A software code has never been made without bugs or errors. From your middle school attempt to create Tic-Tac-Toe with Java to the building of advanced triple-A gaming software, there will always be bugs. The first lesson anyone learns in coding is to check for bugs. It is a lesson so basic that no one, be it CrowdStrike's team or the IT department's updating their company's security software, properly checked the new Falcon update. Grand ambitions are rarely felled by anything other than the basics, and this goes for coding, too. Never forget to clean your code and double-check for errors. Remember to check for the basics in your career, whatever they are.
Anti-monopolization efforts were not at fault; monopolization was: After the crash, Microsoft blamed the EU, noting how their anti-monopolization efforts forced Microsoft to reveal their Windows kernel was at fault for the disaster. In my opinion, the monopolization of the OS market between Windows and Macintosh led to the crash being as harsh as it was. If the market was anything but a duopoly, some companies would have stayed up amongst the crashed industries. With some clever coordination, this would have helped these industries stay afloat, even if they still suffered.
Dear company managers, NEVER underestimate the importance of IT departments and other such groups: Yes, the vast, vast, vast majority of the blame for this disaster being as big as it was goes to CrowdStrike, but it could have been mitigated much quicker if some of the companies took the time to invest and grow their IT departments, rather than viewing them as an unnecessary burden. Tech companies saw massive layoffs this year, often hitting IT or HR departments. The companies viewed these as expendable extras that were only needed in the rare bad situations that basically never happened. That is, until they did. If there is one lesson companies should take from this debacle (other than double-checking the basics), it would be to respect the importance of every department.
Know our world is fragile, and act accordingly: The CrowdStrike crash was a rare instance, but not an impossibly rare one. Such disasters will occur in every industry from time to time, causing small waves or giant tsunamis. A clever individual, company, or organization would diversify their holdings, plan backup servers/databases, and save crucial information in multiple places so it doesn't get lost and operations can continue as normal.
These tips may be basic, but that is the point I am trying to make. When the basics are forgotten, these simple crashes can cause much more harm. Hopefully, most companies and individuals will now remember the basics. In addition to that, I hope the market stabilizes. Much may be said about how each individual industry attempted their recoveries, and the lessons we can learn from that, but that is a story for another day. if everyone is interested For now, I hope for the best for the future.