Cloudflare CEO Apologizes for ‘Unacceptable’ Outage and Explains What Went Wrong – CNET
The Cloudflare outage on Tuesday that disrupted access to many websites and services — including OpenAI, Spotify, X, Grindr, Letterboxd and Canva — was the company’s worst outage since 2019, CEO Matthew Prince says.
Other disruptions have centered on specific network features, Prince wrote in a blog post. “But in the last 6+ years we’ve not had another outage that has caused the majority of core traffic to stop flowing through our network.”
Cloudflare is a cloud services and cybersecurity company based in San Francisco that is used by approximately 20% of all websites, according to W3Techs. It’s one of a handful of services, along with Amazon Web Services, CrowdStrike and Fastly (all of which have experienced major outages in the past few years) that you might never have heard of, but that provide essential internet infrastructure.
The bulk of sites and services impacted by Tuesday’s outage, which began around 3:30 a.m. PT, seemed to recover within just over three hours. By the end of the day, everything had returned to normal, and Cloudflare set about explaining what went wrong. Here’s what you need to know.
Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
What caused the Cloudflare outage?
Cloudflare was keen to emphasize that the outage was not caused either directly or indirectly by a cyberattack. At first the company did suspect it might have originated from a “hyper-scale DDoS attack,” Prince said in his blog post. But it turned out that the outage resulted from an internal software failure.
A change in one of Cloudflare’s databases generated a larger-than-expected feature file, which was too big for the company’s software to run, said Prince. This caused the software to fail.
Once Cloudflare identified the problem, it was able to replace the problematic file with an earlier version and get most traffic flowing normally again by 6:30 a.m. PT.
“We are sorry for the impact to our customers and to the Internet in general,” said Prince. “Given Cloudflare’s importance in the Internet ecosystem any outage of any of our systems is unacceptable. That there was a period of time where our network was not able to route traffic is deeply painful to every member of our team. We know we let you down today.”
Which sites and services were impacted?
Cloudflare has a massive range of clients across the internet, ranging from websites that are household names to smaller services you might not have heard of. Due to its size, when it went down, it took many of those sites and services with it.
Among those affected by the outage was Downdetector, which is where most people go to report problems when services are offline. (Downdetector is owned by the same parent company as CNET, Ziff Davis.)
Once it got back up and running, Downdetector said that it received over 2.1 million reports during the outage period. Over 435,000 of these came from the US, with the UK, Japan and Germany appearing to be the countries that were next most affected.
