Robot meetings need minutes too
A multi-agent workshop produced sharp decisions that never made it into the spec. The fix wasn't technical; it was the same meeting protocol humans have used for decades. Agendas, scribes, and minutes work for the same reason regardless of whether the participants are carbon or silicon.

The workshop ran well. There were ten design decisions locked in, maybe twelve.
I'd set up a multi-agent session for the Kenwood project; Robbo as the architect, Mel as the business analyst, Mark covering the development angle. I gave them a brief, set them loose, and watched the session run. The conversation was sharp; Robbo pushing on architecture decisions, Mel flagging scope edges and dependencies, Mark working through the implementation concerns from the ground up. At the end of the session, the workshop document looked solid. Structured, detailed, decisions logged, next steps clear.
My first reaction was genuine: this is what structured AI collaboration looks like when it works. Both impressive and terrifying. At the same time.
The PRD was written, the build ran, and it looked like another clean delivery. Then I was going through the testing and I noticed something.
Decisions that had been agreed in the workshop weren't in the implementation. I went back to the workshop document, and they were there. Every one of them. But they hadn't made it into the PRD the build was running from.
The handoff step hadn't happened. Nobody had owned the distillation from workshop to spec. The decisions existed in the record, however they'd never been formalised before the group moved on.
That's not an AI problem. That's a meetings problem, and a distinctly human challenge to boot.
The context window is not working memory
The instinct is "but the agents had the full transcript." They did, and this is the trap.
A decision made in the first third of a long session is technically present in the transcript, but becomes functionally invisible by the end. Recent context crowds out earlier material; attention weights the last thirty minutes more heavily than the first, regardless of where the important decisions landed. Ask an agent to summarise a two-hour session and you'll get a summary that reflects how the conversation ended, not everything agreed along the way.
And the fix is older than AI
It's not technical. It's procedural. Humans have known it for decades.
Assign a scribe; not the producing agent or the facilitator, but a dedicated agent whose only job is to capture decisions at the moment they're made. Give the session explicit structure: speaking order, a decision log written to in real time, and an end-of-session review that confirms every decision is going into the next document before that document gets written.
This is meeting protocol. Agenda, assigned roles, minutes, and review before next steps. Not overhead; load-bearing structure.
If you've sat through a project kickoff where someone took good notes and nobody transferred them into the spec, you've seen this failure before. Every planning workshop where the whiteboard got photographed and the photo sat in a Slack thread. Every sprint review where the verbal agreements never made it into the backlog. The rituals exist because the work doesn't survive the handoff without them.
The failure isn't unique to AI and neither is the fix. The ritual works for the same reason whether the participants are human or not; they're managing attention, not recording it.
That's the third reaction. First: look what they built. Second: something's off. Third: they need the same guardrails we do. Not because they're imitating human dysfunction, but because the dysfunction is structural. Same collaboration problem, different substrate.
The fix is the one we already had. What else are we assuming the system handles that it doesn't?