Are we approaching terminal velocity?
Four months of agentic AI acceleration and the inference ceiling arrived fast. Anthropic is buckling under demand, subscription limits are tightening, and the physical constraints of chips, energy, and data centre buildout aren't going away next quarter. The all-you-can-eat buffet is over.

Every week there's a new tool. A new model. A new benchmark that proves we're accelerating faster than ever. The narrative in every AI newsletter and LinkedIn post is the same: we're early. The best is ahead. Build faster.
Four months ago, that felt true. Now it feels like things are slower, or maybe I am used to the speed?
In November last year, Anthropic was still an esoteric alternative, the service serious builders chose because the models were better for real work.
In December last year, engineers at the forefront realized that classic software development was over and the race was on to build tools and capabilities, tools like OpenClaw arrived and we were off to the races!
What happened next was fast. Agentic workflows went from experiments to production. Tools like Claude Code, Cursor, and Windsurf turned subscription AI into all-day workhorses. Developers who'd been dipping a toe jumped in with both feet. The demand curve didn't bend; it broke upward.
And then at the beginning of March the mainstream started discovering Claude.
And Anthropic started buckling.
Outages, peak usage restrictions, plan limits that I'm fairly sure were higher last month seem to be revised downward. Third-party agentic API access restricted, funnelling consumption toward first-party tools.
It seems the all-you-can-eat buffet is over and anyone who built their workflow on the assumption it would always be on now has a another set of challenges.
The tragedy of the commons, speedrun edition
Here's what nobody in the "we're early" crowd is accounting for.
The inference ceiling is largely a physical constraint. We have been building data centers at an incredible pace to augment our intelligence and these huge compute facilities need chips, RAM, disk, cooling, and network infrastructure. They need people to design, build and operate them. Every one of those inputs is constrained and I think we're going to see these constraints become increasingly apparent.
Then there's funding. Building out AI inference at the scale the current demand trajectory requires costs a lot of money and the investment thesis only works if the revenue follows, business 101.
Meanwhile, everyone who built on subscription-based AI access with popular third-party tools is facing the reckoning. The "unlimited" was never unlimited. It was "unlimited while we had spare capacity," and the spare capacity is fast drying up. Four months from building and accelerating to running harder just to stand still. The work expanded to fill the available compute. Now the compute has a hard ceiling, and the work hasn't stopped growing.
But what does all that mean?
The last four months were the acceleration phase. We have all been building and shipping faster every week, and the effort compounded. Same input, more output its been giddying and terrifying at the same time.
As this phase comes to an end we begin to approach terminal velocity the extra effort goes into drag. We find workarounds for subscription limits, tune up alternate models, investigate local inference, find ways to burn less tokens. Work goes into finding ways to become more efficient in doing what we did last month.
Hopefully in the process we can avoid enshittification.