Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Fortune
Fortune
Sharon Goldman

A risky trade-off made CrowdStrike's outage so devastating — cybersecurity leaders say there's no easy fix

(Credit: Selcuk Acar/Anadolu via Getty Images)

When Michael Armer's phone started blowing up at 4 a.m. Friday morning, he "freaked out." Armer, the chief information security officer at RingCentral, was receiving notifications about a stunning computer outage that was knocking down airport, bank, and hospital tech systems like dominos.

The scope of the chaos raised fears of a major cybersecurity breach or a state-sponsored attack. "That’s enough to get your blood flowing really quickly," Armer said.

It turns out that the massive computer outage was not the work of nefarious hackers. It was the result of a glitch in a routine software update by security company CrowdStrike. "We were all very fortunate that this was related to one of their standardized and automated software deployments," Armer says of the CrowdStrike update snafu.

But along with the relief that the disruption was not a cyber attack, the incident has highlighted the fragility and frightening interconnectedness of the technology modern society depends on — and the extent of the danger posed by today's convoluted system of software updates which security experts say stretches staff thin at even the largest organizations and forces a constant balancing act of risky trade-offs.

The problem with patches

Security software like CrowdStrike provide “patches,” or software updates, when threats are detected. Given the number of hackers probing companies' systems and devising new lines of attack, the need for patches is constant — sometimes as many as several times a day. Organizations move quickly and often automate these updates to ensure that there are no holes in their protective shields.

The problem is that new software is like an untested pharmaceutical drug - each new line of code could have a bug or defect that causes problems, unexpected side effects, and dangerous interactions with other software. In an ideal situation, a company would take the time to test each software update before deploying it to all their computers.

“It’s a really difficult conundrum, you cannot keep up with the number,” said a CISO at a top law firm in New York City. "Sometimes you have to put out a security patch because it’s critical and you’ve got vendors breathing down your neck and there’s no way to [test] it,” he said. “Sometimes there are several updates within a 24-hour period so you’d be caught in a recursive circle of testing where you would just never be done.”

For many in-house security teams, that means striking a balance between speed and risk. “The antivirus products are pushing up multiple updates per day because in some ways we've pushed them into a corner,” said Paul Davis, field CISO at software supply chain platform JFrog. "The faster that they can respond to detect a piece of software or malicious activity, the better they are. So that being the case, then the requirement to test multiple times a day becomes onerous.” 

The real challenge, he said, is how to protect the organization that is responding to cybersecurity threats which can spread in hours, or even minutes, and at the same time make sure those software updates are tested. “We have to test the basic functionality of the software, but we rely on these automated updates to be safe, and it’s almost like a calculated risk.” 

Hands-on CPR for each affected computer

The New York City law firm uses more than 30 separate security tools from a variety of vendors that run on laptops, desktops or servers. Normally, if an update causes problems, the software vendor will deploy a fix that an organization can quickly push to thousands of computers within the same day.

But because of the nature of the CrowdStrike flaw however, that wasn't possible. The flaw essentially caused computers running Microsoft Windows to freeze up and display the dreaded "blue screen of death." Affected systems needed to be brought back to life, one by one.

“You have to physically walk over to every computer and power it down and then bring it up, and when the screen comes up, you have to hit F3 to go into what they call Safe Mode and then go and delete a file somewhere,” the New York law firm CISO explained. “It's just a nightmare.” 

Some CISOs, however, put the bulk of the blame on Microsoft, not on Crowdstrike– and even avoid Windows altogether if they can. “In Silicon Valley, tech companies tend to avoid Windows,” said the CISO of a medium-sized AI company, who requested anonymity due to the sensitivity of discussing security mitigations. He said that it is because of the design of Windows in its core architecture that leads to malware, spyware and the driver instability that occurred today as a result of the Crowdstrike flawed update. 

“CrowdStrike has clear process improvements to make, obviously, but it should not be possible in 2024 to have a kernel [core architecture] which is destabilized by a third party,” he said. “Microsoft has had a bad year, from a security perspective, and they have to win the trust of the ecosystem back.” Microsoft did not respond to a request for comment other than pointing to its existing statement about the outage.

In a statement posted online Friday, CrowdStrike CEO George Kurtz apologized for the incident, which he said involved a "content update for Windows hosts," noting that Mac and Linux hosts were not affected.
"All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority."

Post-game analysis

JFrog’s Davis pushed back on the idea that a typical organization could get away with not using Windows. “Windows is still the predominant operating system,” he said. “When you join a company, you’re [usually] offered either a Windows machine or a Mac machine.” 

John Paul Cunningham, CISO at identity security company Silverfort, said that Friday's outage should be a wake-up for call for organizations, and make companies more leery of automated software updates. In Cunningham's view, all threats are not created equal and companies can exercise more discretion by not always defaulting to the automated updates.

“Companies like CrowdStrike often suggest doing auto updates with this premise that staying on the most current release of the product is more secure," he said. But companies can take more time to test it before pushing it out, he said, even if it takes a little more work. “As long as the security team knows there is an update, they can push it out manually–the update itself is still automatic.”  

The bottom line is that for most cybersecurity leaders, figuring out how to strike a balance—between risk and speed, and between operating systems—will require some post-game analysis and decision-making, said RingCentral's Armer.

And while getting a grip on software updates is important, he noted that companies should also be thankful Friday's outage was not even worse. “I personally am thankful that it wasn't a state-sponsored attack," he said.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.