By Ellie Franco
Last week the world experienced one of the biggest IT failures in history, when the cybersecurity software CrowdStrike pushed out a new update that crashed computers globally.
The systems are now back online, but how worried should we be of future outages that could grind society to a halt?
Last Friday, I was sweaty, stressed and seemingly skint in Bangkok - I was unable to book a taxi to the airport ahead of my 19-hour multi-transit flight home to New Zealand.
I borrowed cash from a friend and managed to get to the airport with less than an hour till boarding time - where I quickly realised I wasn’t alone.
The CrowdStrike IT outage had halted global operating systems - more than 8.5 million Microsoft Windows devices had crashed.
Banks, airports, hospitals, TV stations and supermarkets were knocked to submission in an event that computer science experts say could have been avoided if back-up systems had been in place.
Given our reliance on IT systems, how likely is it that society can grind to a halt at any moment?
David Tuffley, cybersecurity expert and senior lecturer at Australia’s Griffith University, says an apocalyptic end-of-the-web scenario is unlikely.
“The internet is actually a very reliable machine. The reason why this was so catastrophic is that credit card processing is integrated into so many different things now, that really everything that had the Crowdstrike Falcon product installed on it, had a problem - they’re the computers that fell over.”
Tuffley says to break the internet for a significant period of time would require an incredibly rare geomagnetic storm.
A storm like this took place in 1859, called the Carrington Event. It caused electric shocks to humans, sparked infrastructure and set some telegraph stations ablaze across Europe and North America.
If this were to happen today, the physical infrastructure of the internet would need to be fried and gradually rebuilt over months, with data replaced via back-ups not damaged in the storm.
“But that’s most unlikely,” Tuffley says.
Allyn Robins, AI lead of the New Zealand-based technology think tank Brainbox Institute compared the CrowdStrike situation to a different event - the 1962 loss of an $80 million NASA rocket due to a single missing hyphen in its code.
“That's why it's crucial to have extremely robust security and quality assurance practices in place, and that's where CrowdStrike failed - if they'd been checking and fixing their code as carefully as they should have, this error never would have slipped through,” he said.
CrowdStrike chief executive George Kurtz issued a public apology on the day of the outage, saying it “was caused by a defect found in a Falcon content update for Windows hosts”.
But what if technology could be used maliciously to break the internet?
Tech experts assume the global technological halt was because of something called “full regression testing” not taking place.
This hasn’t been publicly confirmed or denied by CrowdStrike yet.
Tuffley says Falcon had issued an update causing Windows PCs to fall over.
“In software engineering there’s a colourful word for it, they call it ‘a deadly embrace’ - when the bugged update got onto the Windows machine, it caused a logic error that caused Windows to crash.”
Logic errors are coding errors only detectable within computer systems - they’re not words or images that are easily picked up by human eyes.
The technical process of full regression testing would have noticed the logic error that sparked the conflict between CrowdStrike’s Falcon update and Windows’ operating systems, Tuffley says.
However, some phishing scammers did take advantage of those impacted by the outage, promoting fake support websites and posing as banks assisting frustrated customers.
Although not entirely impossible, Allyn Robins says it would be difficult for cyber criminals to take advantage of this specific fault directly.
“The fault shuts systems down, making it difficult to use them nefariously, or at all,” he said.
Why back-up plans are so important
Crowdstrike is used by nearly 300 of the Fortune 500 Companies.
Global agencies and the general public were unconsciously counting on the technology to work seamlessly throughout any background updates.
David Tuffley says a concept called “redundancy” is a critical component of both cybersecurity and overall business planning.
For example, diesel generators are an example of “redundancy” used in rural New Zealand during power outages to back up electricity, maintaining customers’ trust in power companies - a concept called business continuity.
“It involves creating multiple layers of backup systems, processes, and resources to ensure that operations can continue smoothly even if one or more components fail.”
Cloud back-ups are an example of this, and would avoid future outages, data loss and ensure resilience against cyber attacks, he says.
"The reason that there weren't redundant backup systems already is that it costs money and time, and organisations rarely want to spend that unless they feel they have to,” Allyn Robins says.
So, both Robins and Tuffley say the possibility of the internet locking down for a long time is not entirely impossible, but it is highly unlikely.
What is more likely, they say, is an oncoming shift to tighten back-up systems across the planet.
More stories:
Teaching the history of Aotearoa through art
“Our country must know its history.”
Rivers and whales as people? Legal personhood explained
Aotearoa has three legal personalities, here’s what it means.
How climate change will make your life more expensive
Here’s how the warming planet could significantly drive up day-to-day costs.