What Do You Do When Your Gremlins Get Wet?

AI builds software faster than ever, but the bugs scale with the velocity. The gremlins are already on board — the question is what happens when they hit the water.

Sam Sabey|
ai-developmentdebuggingsoftware-craft
What Do You Do When Your Gremlins Get Wet?

The ship is dry and everything looks fine

AI lets us build software at a speed that would've seemed absurd five years ago. Features that used to take weeks get assembled in hours. Screens, database tables, API endpoints, authentication flows; the whole stack materialises while you describe what you want.

The bar has been raised across the board. And yet. It's still the same game underneath.

Think about building a ship. Everything looks perfect while it sits up on the hard stand. The keel is laid, the frames rigged, the plates welded into place. If you've seen the old footage of Titanic being assembled at Harland and Wolff in Belfast, you know the image. Thousands of workers, an enormous steel structure taking shape piece by piece. Magnificent engineering on dry land.

But building the ship is only half the problem. Significant planning goes into what happens next. How to hold the ship on the slipway. How to control the slide. What happens at the moment of transition, when it's half on land and half in water. How it behaves when it finally floats free.

Software follows the same trajectory. The build phase feels controlled. Everything is dry. You can see the structure, test individual pieces, admire the shape of the thing. Then it needs to work for real.

The gremlins are already on board

Here's the thing nobody tells you about vibe-coded software: the gremlins are already on board before the ship ever moves.

They're tucked behind panels, curled up in cable runs, nested deep in the bilges, wedged into spaces between bulkheads that nobody inspected closely enough. They've been there since the first line of generated code. And they're perfectly content, because the ship is still dry.

Then it slides down the slipway and hits the water.

Water finds every gap. Every unsealed joint, every forgotten compartment, every hairline crack in a weld. It seeps into places that have never been wet before. And the gremlins? They get wet. And if you remember the rules, you know exactly what happens next.

They multiply.

The impact of AI on software development is Titanic in every sense of the word. It builds bigger, faster, more ambitious things than any of us have attempted before. But the gremlins scale with it. More generated code means more hiding places. More velocity means less time spent understanding what was built and where the gaps live.

Anyone who's built software knows the particular joy of debugging. Reading code line by line, then operating the code and discovering what's happening when real conditions meet assumptions. That joy hasn't gone anywhere. But the nature of it has shifted.

Speed hides the problems

Classical debugging was meditative. Sit down with the code, trace the logic, form a hypothesis about where things went wrong. Set breakpoints or add print statements. Step through execution line by line. Watch variables change. Turn the gears tooth by tooth and observe each one meshing with the next.

In that slow, deliberate process, you'd find the moment where a gear slipped. And in finding it, you'd already understand the fix. The insight and the solution arrived together.

With AI-assisted development, the gears spin so fast you never see them mesh. The code works (or appears to work) almost instantly. The feedback loop between "describe what I want" and "here it is" has compressed from days to minutes. And in that compression, the problems get hidden.

Not eliminated. Hidden.

Because the gremlins don't care how fast the ship was built. They care about the water.

Debugging agents with agents

I've been living this firsthand while building Claude Headspace. The AI assembles features with impressive precision. OAuth flows, multi-step user journeys, workspace management. It produces code that looks right, passes initial checks, and matches the spec.

Then I run it. And the invisible complexities surface. Edge cases in how tenants are scoped. Subtle ordering issues in multi-step flows. Authentication states that behave differently depending on which path a user took to arrive at a screen. Bugs hiding in places I didn't know existed, because I didn't write the code that created those places.

So I started building debug hooks.

These are small, purpose-built tools that give one AI agent visibility into another agent's operation. The approach works like this:

  • Instrumentation first. I get the agent to add targeted debug logging around the area where a problem is occurring. Not blanket logging; specific traces that capture the state at decision points.

  • Observe under real conditions. One agent monitors the logs in real time while a second agent operates the system, following the exact steps that trigger the bug. The observer watches what happens. The operator doesn't need to know it's being watched.

  • Assemble the picture. The observing agent takes everything it captured; the log output, the sequence of events, the state transitions; and puts together a coherent picture of what happened and where the behaviour diverged from the expected path.

  • Propose, don't fix. The agent doesn't go straight to a code change. It assembles a proposal: here's what I observed, here's where I believe the problem originates, here's my suggested approach. Then we workshop the fix together before any code gets touched.

This is the same debugging discipline I practiced for years by hand. Analyze the problem. Form a hypothesis. Instrument the code. Observe execution. Isolate the fault. Propose a fix. Verify.

The tools changed. The discipline didn't.

Same craft, different hands

The gears still need to mesh. The ship still needs to float. The gremlins still need to be found before they multiply into something unmanageable.

What's changed is who's turning the gears and how fast they're spinning. The debugging mindset; patience, curiosity, systematic observation; matters more now than it did when everything moved slowly enough to see.

If you're building with AI agents, the question isn't whether your gremlins will get wet. They will. The question is whether you've built the tools and habits to find them when they do.

The water is coming. It always does.