6 March 2026

Hard Yakka: Building Software the Agentic Way

Started building a document intelligence system in November. By February I'd also built a development automation pipeline, an orchestration engine, and an agent monitoring platform. The tool built the tool that built the next tool. Old skills, new magic. In Australia, we call it hard yakka.

Sam Sabey|

Hard Yakka: Building Software the Agentic Way

I wrote a few weeks ago about yak shaving and asked whether the tooling I was building was blocking production or blocking the ideal. I have an answer now. The yaks were real. Every one of them.

It started in November. I sat down to build RAGlue, a citation-first document intelligence system in Ruby on Rails. Spec-driven, agentic development. Claude Code writing the implementation, me reviewing and steering. The engineering problem was specific: can a retrieval system prove where its answer came from, down to the page and section, in a way that survives audit?

That problem was interesting. What I was spending my time on was not.

The operational wall

The agent writes code fast. That part lives up to the hype. What nobody mentions is everything that happens between "the agent wrote code" and "that code is in production." The gremlins are already on board.

Git commits need to be atomic and meaningful. Branches need to track changes against specs. When the agent takes a wrong turn, the rollback has to be clean. When three changes are in flight, each one needs its own context, its own branch, its own review cycle. The agent doesn't manage any of this. I was managing it by hand, and it was eating my weeks.

Spec drift was the other killer. I'd write a spec, the agent would build against it, I'd request changes, and within two iterations the implementation had wandered from what the spec described. Not wrong, necessarily. But untraceable. The connection between "what I asked for" and "what got built" was dissolving.

By December I was spending more time on process choreography than on RAGlue.

So I built the process

Mid-December I stopped building the product and started building the scaffolding.

First came the SOPs. Standard operating procedures for every step of an agentic development cycle: git automation, branch-per-change, pre-build snapshots, rollback points. Then came OpenSpec, a spec system that tracks changes as structured artifacts through their full lifecycle: proposal, delta spec, implementation tasks, verification. The spec and the code stay linked from start to finish.

Then I built the orchestration engine. PRDs go into a queue. Each one gets a proposal, a build phase, automated testing with retry loops, validation against the spec, and a human checkpoint before merge. I can line up a batch of PRDs and pipe them through the engine one after another.

The agent does the work. The pipeline makes sure the work is traceable, reversible, and auditable.

I wasn't the only one

In late January I came across Peter Steinberger's ClaudeBot project, now known as OpenClaw. He'd hit the same wall. Started in November, same timeframe, same frustrations with the operational gap in agentic development. His approach to managing agent workloads clicked something into place for me. I'd already been watching what OpenClaw was doing to the token charts. Independent builders, working on the same problem, arriving at parallel solutions. That's not coincidence. That's the frontier telling you what's missing.

Then the agents needed managing

The orchestration engine worked. I was running multiple agents across multiple projects. And the ergonomics of managing all of that in terminal sessions became its own problem.

So I vibe-coded a monitoring dashboard. It got messy fast. Classic.

I went back to pen and paper. Drew out what I wanted. Took photos of the sketches and notes, ran them through OCR, workshopped the output into a system prompt, and fed the whole thing into my own orchestration engine.

The engine built Claude Headspace. A real-time agent monitoring platform with frustration scoring, flow state tracking, priority ranking, and the ability to manage agents remotely. Built spec-driven, by the pipeline, the way everything gets built now.

The tool built the tool that built the next tool. There's a YouTube channel called Inheritance Machining where a mechanical engineer restores his grandfather's machine shop and makes new things with old tools. That's what this feels like. Thirty years of engineering skills, the old craft, fed into new tooling that builds things I couldn't have built before.

And it keeps going

Since then I've used Claude Headspace to build May Belle, a service automation platform — including dealing with agents who find creative shortcuts you didn't design for. While continuing to extend Claude Headspace itself. While running other projects through the same pipeline.

Yesterday afternoon I was on my mountain bike on a trail somewhere in bushland along the Yarra River, stopping every now and then to check on my agents and steer them through builds. Multiple agents, multiple projects, spec-driven, auditable. And then it hit me. Back in early January I'd been on a ride, voice-chatting with Claude, workshopping the idea for what became Claude Headspace. Now I was on the same trails, using the thing I'd described. Six months ago I was stuck on git commits.

I didn't plan any of this. I planned to build a document intelligence system. Every tool that followed exists because the previous thing couldn't ship without it. The yaks were blocking production, not the ideal.

Some days I look at what's running and think about the gap between where this started and where it is. A frustrated developer automating git operations turned into a development platform that builds applications. In Australia, we'd call that hard yakka. The unsexy, load-bearing work that makes everything else possible.

I'm still not sure which product is the main one.

← All posts