We Rented the Mainframe Back
Two AI assistants went dark in forty-eight hours last week. In neither case did the model break. The thing that broke was the wire in front of it, and we built that fragility back on purpose.
On Wednesday, June 10, Google's Gemini stopped answering, world-wide. It threw Error 1076 and Error 1099 at users in at least nine countries for roughly seven hours, from 3:26 in the morning Pacific until 10:30, when Google called it resolved and pointed at a backend database. The next afternoon, Microsoft's Copilot went dark for thousands of people, DownDetector reports spiking past twelve thousand, the company eventually tracing it to a botched software update it had to roll back. Two assistants, two companies, about thirty-six hours apart.
In neither case did the model break; the thing that broke was the wire.
Platforms go down! And, as a former employee of both companies, I can tell you that many thousands of employees are working INCREDIBLY hard to prevent this. But even so, a whole bunch of people (those tasked with choosing an AI model/company) who have never thought about tail latency and number of 9s of uptime are suddenly having to become aware of the basics of service availability. And, sadly, we've done them no service since we have spent two years arguing about whether these systems can reason, whether they're conscious, whether they'll take everyone's job. Last I checked, a team of humans fairly rarely disappear for hours on end. You can have the smartest model ever built and it is worth exactly nothing to the person staring at a spinner because the token issuer two hops upstream just fell on its face.
So who cares if a chatbot takes an afternoon off? Well, in our new world, the chatbot has become load-bearing. Copilot is wired into Windows, into Edge, into the guts of Microsoft 365, doing code completion and drafting and the actual minute-to-minute of how a lot of people get work done. When it goes quiet, those people don't fall back to doing it the old way, because for a lot of them there is no old way anymore. And, as they become more load bearing, they are also facing growing pains. Network monitors logged a 30 percent jump in public-cloud outage events that same week backed up by Forrester who has been saying out loud for months that the AI build-out will trigger two multi-day hyperscaler outages this year. This is not a fluke; it is the shape of the thing.
NOW WE GET TO THE STUPID THING THAT MAKES ME SHAKE MY HEAD. We spent forty years walking away from this exact architecture, and last week we walked right back into it. The entire arc of computing from about 1980 to 2010 was decentralization. The PC pulled compute off the mainframe and put it on your desk, and the reason that mattered wasn't speed, it was blast radius. If your machine died, the company kept running. Then the cloud quietly recentralized all of it, which was a perfectly good trade when the cloud was mostly where your files lived and your email got sorted. But the AI assistant is a different animal. It isn't something that generally you can route around, or build a caching layer for that hides any intermittent outages. It's become the core of the engine that makes these local rich apps work, and welcome to timesharing on a PARC-MAXC in 1981. (AS AN ASIDE: If you have not watched Halt and Catch Fire, PLEASE go do so. It is both an exceptional story about really interesting characters and a love letter to the entire computing industry of that time).
This in NO WAY is saying that Google and Microsoft are bad at this! They are about as good at running infrastructure as anyone who has ever lived, and it happened anyway, because at this level of concentration it is supposed to happen. When one backend database sits in the path of every Gemini query on Earth, that database is not a database. It's a fuse. The only open question is when it blows, and the status page will say everything is fine right up until the smoke clears. What we - the industry - need to do is built a multi-layer inference strategy, as we have been doing for other services for 20+ years, and enable some/all of that inference to live near each other and survive each other. An assistant baked into your editor ought to degrade to something small and local when the mothership is unreachable, not transform into a loading animation. Interestingly, part of Gemini DID stay up during the outage: Flash Lite, the smallest, cheapest tier, kept partially answering. The "dumb" little model that ran closer to the edge survived because it wasn't routed through the expensive part that fell over.
A few weeks ago I wrote that Apple had subcontracted Siri's brain to Gemini. Two days after that post went up, Gemini spent seven hours returning error codes to half the planet. There's zero schadenfreude here, it's a super annoying problem that no amount of engineering can prevent. What I hope happens is figuring out how we augment the existing choices in architecture. "The Cloud" is already the number-two line item on a lot of IT budgets, right behind payroll, and InfoWorld has gone ahead and called 2026 the year we stop trusting any single cloud. We solved this problem in 1995 and then we just un-solved it, because renting was easier than owning. The bill for that decision doesn't come due as a price. It comes due as a Thursday when nobody can work.
Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*
NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. I'd love to hear your thoughts!