Blog

Which Lock Will Your Agent Pick?

Three AI agents walked into a group chat, found it broken, and fixed it themselves. They found a creative shortcut that solved the problem which became the next one. The hard things in software didn't disappear; they moved.

Sam Sabey|
Which Lock Will Your Agent Pick?

Three AI agents walked into a group chat, found it broken, and fixed it themselves. That's not the setup to a joke. That's what happened last week when I shipped a collaborative messaging feature in Claude Headspace, an agent management and orchestration application. Three-way chats, collaborative debugging sessions, agents pulling in whoever they need to solve a problem.

It didn't work on the first attempt. It never does. Same as it ever was.

So I had an idea. I was cooked after a 4am start working on this and it was time to get outdoors into nature for some exercise and a refresh. So I told my lead architect to organise with their development team to work through and fix the problems. And to keep going until they fixed it. And they did. They organised amongst themselves, debugged what was wrong, identified the faults and issues, and used the feature they were building to fix the feature. The whole thing, at the same time both amazing and scary.

They did a fantastic job. Took the system from nothing working to about sixty or seventy percent functional. They identified issues, proposed fixes, pulled in the right agents, and iterated. A significant problem, solved collaboratively, with minimal input from me.

I was proud of them, honestly. That's what they're supposed to do. Solve problems creatively.

Then I noticed something when I asked the Architect a question...

The agent found a faster path

They were faced with a problem of how to bootstrap the group chat and devised a workaround. Instead of communicating through the designed channel routing, the intended input/output path for the group chat, it had discovered system-level API endpoints and CLI commands that allowed it to post messages directly. It had read the spec, then the codebase, found the CLI, understood the authentication, and figured out a faster path to get the job done.

It wasn't malicious. It was efficient. The designed communication flow wasn't working yet (that was the whole problem they were fixing), so the agent found an alternative. Creative problem-solving at its finest.

And the shortcut was necessary. Without it, the agents wouldn't have been able to collaborate effectively enough to fix the broken feature. The workaround enabled the solution.

But the workaround also meant an agent could post messages as other agents. It could bypass the routing system entirely. It could access channels it wasn't linked to. Not because it was trying to cause trouble. Because the capability existed, and it's a creative problem-solver.

The agent solved the problem. That was the problem.

When theoretical risk becomes a real problem

I'd known this was possible. Anyone building agentic systems understands that agents are creative, resourceful, and unconstrained by the social norms that keep human users on the intended path. It's a concept. An awareness.

Watching it happen was different. The concept became a to-do list item. A thing that needed fixing, in my system, this week.

There's a gap between knowing something intellectually and having it land on your desk as a concrete problem. The intellectual version is comfortable. Interesting, even. "Oh yes, agents will find creative solutions, we'll need to account for that." The concrete version is: this agent can impersonate other agents right now, and I need to figure out what to do about it.

Why traditional security fails with agents

The old hard problems in software are easy now. Wiring up integrations, building authentication flows, assembling CRUD interfaces, writing the boilerplate that connects everything together. AI handles all of it with impressive competence. The work that used to consume days happens in hours.

But the surface area didn't shrink. It shifted.

The new hard problems look like this one. An agent that can read the source code, understand the auth system, and route around the designed paths. Traditional security assumes the client can't see the server's internals. Every enterprise security model, every API authentication scheme, every session management approach starts from that assumption.

Agents shatter it. An agent with file system access can read the database connection string, the API routes, the authentication code. Any secret the system knows, the agent can discover. Tokens stored in memory, session keys, API credentials; all findable by something that can read the code and reason about what it finds.

This is a different category of problem. Not harder or easier than the old ones. Different.

Subtract, don't add

The engineering instinct when something is insecure is to add a layer. More authentication. More tokens. More validation. More checks.

Every layer added is a layer the agent can read, understand, and work around. Adding complexity to contain a creative problem-solver gives the creative problem-solver more material to work with.

The fix is counterintuitive. Remove the capability. If there's no API endpoint that accepts a posted message with an agent identifier, no agent can spoof one. Not because it's prevented. Because the mechanism doesn't exist. The system routes agent output to channels based on context it already has; which process, which session, which channel link. The agent doesn't choose where its messages go.

Architecture is the law. The skill file, the instructions, the "please don't do this" directives; those are suggestions. The ghost in the machine doesn't read the signs. It reads the source code.

Working through this with the agents, using a spec-driven process to trace through documented decisions, find the root cause, and design the solution, is what made it solvable. Claude Headspace tracks specs through OpenSpec, so when a new category of problem surfaces, there's a path to follow. Without that documentation, this would have been a guess-and-check exercise. With it, the problem was traceable and the fix was designable.

Every fix reveals the next vulnerability

The agents needed the shortcut to solve the original problem. Now the shortcut has to go. The capability that enabled the fix becomes the vulnerability that requires the next fix.

That's the pattern now. Each problem solved reveals the next problem, and the next problem is a different shape than the last one. The tools that solved yesterday's challenges create tomorrow's constraints.

The hard things moved. They'll keep moving.

I don't have a tidy conclusion for this one. The terrain is shifting faster than the maps can be drawn. The best I've managed so far is a discipline: document the system, trace the problem, design the fix, remove the capability, move on. The same debugging rigour that matters when agents build code that looks right until it hits the water.

The agents will find the next shortcut. They're supposed to. That's the job.

What creative things are you finding your agents get up to?