18 May 2026

The shakedown

$1,500 in tokens on a Tuesday, and a $10 subscription covering the tab. Anthropic is repricing inference, Uber blew its annual AI budget by March, and GitHub killed flat fee Copilot pricing because agentic workloads broke the model. The cost of inference is the elephant in the room, and the shakedown is coming for every business model pretending it doesn't exist.

Sam Sabey|

Yesterday my activity log tracked $1500 in token burn. This wasn't an unusual spike, however it was $500 over my average daily burn. On a Tuesday.

My monthly average sits between twenty three and twenty seven thousand dollars, which works out to roughly a thousand dollars per working day, approaching that of a good consulting rate. However my Claude subscription creates an economic distortion. My cost is about $10 a day on the Max20 plan, leading to a 100 to 1 disparity.

Burn Baby Burn, almost like the seventies where at the disco is an inferno.

Three weeks ago, something shifted in the Anthropic codebase. Code changes started appearing that pointed at a separation between subscription plans and SDK usage. The signals were subtle, but the direction was clear: the era of subsidised inference is ending. Last week it became official. SDK usage is now metered independently from subscriptions, which means the unlimited buffet that heavy builders have been running on is closing.

This should not be much surprise to anyone paying attention.

Uber's CTO Praveen Neppalli Naga confirmed the company blew through its entire 2026 AI budget in roughly four months. Claude Code adoption across their 5,000 engineer organisation jumped from 32% to 84% between December and March. Seventy percent of committed code was AI generated. Heavy users were burning $500 to $2,000 a month each, and internal leaderboards tracking usage accelerated consumption beyond any projection. His words: "The budget I thought I would need is blown away already. I'm back to the drawing board."

Chamath Palihapitiya's software startup 8090 disclosed that AI costs had more than tripled since November 2025, tracking toward $10M a year. The growth rate was 3x every quarter. It got bad enough that the company migrated away from Cursor specifically to cut token spend, switching to Claude Code's flat Pro plan to eliminate the per token bills. The inefficiency driver was what they called "Ralph Wiggum loops," agents prompting models over and over until a solution arrives.

Then GitHub killed flat fee Copilot pricing effective June 2026 because agentic usage made the model unsustainable from the vendor side. Their own words: "Today, a quick chat question and a multi hour autonomous coding session can cost the same amount." When the platform provider itself cannot absorb the inference cost, the pricing model is broken at a structural level.

My own token burn matches this closely, with month on month increase now leveling as I bump up against the weekly usage cap of my Claude subscription. I'm managing my usage within my weekly allowance, deferring builds and heavy days and I am reliably hitting 95% usage.

The Era of the V8

Once upon a time, petrol was cheap. We built big V8 engines that produced plenty horsepower with a corresponding high fuel burn. The optimisation was for power, not efficiency because the input cost was negligible and in the context of the economy at the time, this was sensible. And then the first oil crisis happened.

The market responded with small, turbocharged, highly strung engines producing the same horsepower or more from a fraction of the fuel. The engineering got better because it had to. The constraint created the craft.

Agentic workloads are the V8s of AI. We built them because tokens were cheap. Agents that loop, retry, explore, and burn through context windows were the natural product of subsidised inference, the same way a 5.0 litre V8 was the natural product of petrol at a dollar a gallon. The economics are about to force a redesign.

The bill

This is not a pricing decision by one company. The physics underneath applies to everyone.

Tokens are a finite resource constrained by the amount of silicon online and available, and the energy required to pump electrons through that silicon. The ability to make more silicon is itself constrained: helium, which is essential for cooling chip fabrication processes, faces supply pressures linked to geopolitics in the Middle East. Data centres are consuming electricity that communities need for homes and businesses, drawing water, occupying land, generating noise. The environmental and civic costs are becoming visible at ground level.

Anthropic is the canary. They are constrained on compute and showing it first. OpenAI is unlikely to impose the same restrictions in the short term because they are bleeding market share and subsidised inference is one lever for holding it. But the physics does not care about competitive strategy. Every provider sits on the same silicon, draws from the same energy grid, and faces the same fabrication constraints.

The economic stress precipitated by geopolitical decisions is squeezing everyone. Providers that have taken massive investment will need to start showing returns. The grass roots impact of data centres on local communities is generating pushback. It feels like the bill has been running for a while and the waiter is walking over.

The shakedown

The actual cost of inference is the elephant in the room.

Airlines don't pretend fuel is free. Fuel is one of their largest operating costs and their entire pricing structure reflects that, down to the surcharge on the ticket. Trucking companies pass fuel through. Shipping lines pass fuel through. Every industry where energy is a significant input cost has learned to price for it, hedge against it, and build operations around its volatility.

AI has not had that conversation yet.

SaaS products historically ran 80 to 90% gross margins because the software itself was efficient to operate. What happens to that business model when inference gobbles up another 40%? I am doing a thousand dollars a day in consulting. How is the market going to absorb me doubling that to cover the inference cost on top? The cost of the token burn has to go somewhere, and right now most products and services are pretending it doesn't exist.

That is what the shakedown is coming for. The cost of inference has to be incorporated into the business model, the service offering, the go to market. And more importantly, it has to guide what gets built in the first place. If the thing being built carries a heavy inference cost in its daily operation, the economics will eventually break it.

There are deeper questions in here, and this is the first post in a series that will work through them. What happens to the knowledge gap between those who burned tokens hard and those who sat on the fence? What happens when the heavy lifting stops and the skills we let atrophy are the ones we need? What do we build when the fuel is no longer cheap?

I do not know what the shakedown looks like. But it feels like preparation is no longer optional.

← All posts