
~/blog/rpg-agent-orchestration-voxel-office
I Turned My Agent Orchestration Into an RPG (And It Made Everything Click)
Most agent dashboards are spinners pretending to be alive. I built mine as a 3D voxel office — every agent is a blocky character with a name, a room, and a status badge floating over its head. It's unironic, it's silly, and it's the best dev-tooling decision I've made this year.

The core idea: agent orchestration isn't a software problem, it's a coordination problem. Coordination is something humans have built tools for since cities — org charts, war rooms, kanban boards. They all encode the same trick: turn invisible parallel work into something you can point at. A 3D voxel office is just the latest version of that trick.
The hook
Most agent orchestration UIs look like this: a tree of nested boxes, a stack of JSON logs, or a Kanban board pretending to be alive. You stare at it long enough and you stop seeing agents — you start seeing a build pipeline with extra steps.
So I built mine as a 3D voxel office. Each agent is a little blocky character. They live in themed rooms. They walk to a council chamber when it's review time. A label floats over their head that says WORKING, REVIEW, or DONE.
It is, unironically, the best dev-tooling decision I've made this year.
Why an RPG?
Three things break in normal agent dashboards:
- You can't feel parallelism. A spinner is a spinner. Twelve spinners is twelve spinners. Your brain refuses to model it.
- You can't tell who's idle. Logs only fire when something happens. Silence is ambiguous — busy, blocked, or dead?
- Roles blur into prompts. Every agent ends up looking like "the one with the long system prompt." There's no identity, just config.
A spatial, character-driven interface fixes all three at once. Position is parallelism. Stillness is idleness. A face and a name is identity.
The cast
I split the agents into squads, each in its own color-coded room. Naming was deliberate — short, slightly silly, easy to say out loud during a debug session. Boring names produce boring mental models.
Engineering (the green room)
- archi — architect. Designs before anyone else writes a line. Owns the "does this fit the system" call.
- fronto — frontend. Components, routes, client state.
- backo — backend. APIs, services, data layer.
- testo — tests. Writes them, runs them, refuses to ship without them.
Growth (the pink room)
- buzzy — distribution and reach. Knows where audiences live.
- growthy — experiments and funnels. Owns the "did this actually move a number" question.
- wordy — copy and narrative. Translates features into language.
Operations (the amber room)
- shipper — release engineer. Cuts builds, runs deploys, watches the rollout.
- guardy — security and compliance. The friendly paranoid.
- scaley — infra and capacity. Wakes up if a graph turns red.
Product (the blue room)
- pixie — design. Tokens, layouts, motion.
- prio — prioritization. Argues about what matters this week.
Legal (the gold room)
- clause — contracts, policy, risk language. Often working alone, which feels right.
The Council (the purple room)
- nova, sage, blaze, vera — the reviewers. They don't build. They critique. They show up when work is ready and stamp
DONE(or send it back).
That's it. Seventeen named characters. Every prompt I send routes to one of them.

Engineering room

Growth room

Operations room

Product room

Legal room

The architecture under the costume
Strip the voxel skin off and the system is straightforward:
[ Orchestrator ] ── dispatches task ──▶ [ Squad room ]
│ │
│ ┌────┴────┐
│ ▼ ▼
│ [ Agent ] [ Agent ] ← parallel
│ │ │
│ ▼ ▼
│ writes work artifacts
│ │
▼ ▼
[ Council room ] ◀── ready-for-review ──┘
│
├──▶ nova (one critique lens)
├──▶ sage (another lens)
├──▶ blaze (another lens)
└──▶ vera (another lens)
│
▼
merged verdict → back to squad or → DONEEach room is a topic-scoped workspace. Each agent is a system prompt + a small toolset + a state machine with three states: WORKING, REVIEW, DONE. The orchestrator is just a router with a queue.
The 3D scene is a thin client over that state. Position = which room. Animation = current state. The label is literally agent.state. Nothing fancy — but rendering it as a place instead of a table changes how you reason about it.
The Council pattern
This is the part I'm most happy with.
Reviews used to be a single agent reading the output of another agent and rubber-stamping it. Boring, weak, and prone to "looks fine to me" failure.
Now I have four reviewers with deliberately different personalities:
- nova — looks for what's new and risky. Asks "what could break that didn't exist before?"
- sage — looks for what's missing. Asks "what would an experienced person have done that this doesn't?"
- blaze — looks for speed and decisiveness. Pushes to ship if the work is good enough.
- vera — looks for truth and evidence. Demands tests, links, receipts.
When work hits review, all four run in parallel. Their critiques get merged into a single verdict. They disagree often. The disagreements are the most useful signal in the entire system — they surface tradeoffs the original agent didn't think about.
Visually: the four reviewers walk to the council chamber, line up, and one by one their labels flip from REVIEW to DONE. When all four are green, the work is released.

When the four reviewers agree, every badge flips to green and the work is released:

Why this beats a dashboard
A few patterns I didn't expect:
Idle becomes obvious. When a squad's room is empty or everyone's standing still, you immediately notice. A dashboard would show "0 active jobs" and you'd shrug. A still room is unsettling in exactly the right way — it makes you ask "why isn't anyone working on growth right now?"

Bottlenecks become spatial. If three agents are stuck on REVIEW and the council chamber is empty, you can see the queue waiting. You don't need to read a metric.


Naming agents makes you write better prompts. "What should backo know that fronto doesn't?" is a sharper design question than "how should I split the system prompts?" The character forces specificity.
Status broadcasts become storytelling. When an agent finishes a task, it can pop a speech bubble: "Briefed Growth team on 3 tasks." That single line is better changelog than most teams produce.
Things that went wrong (and stayed wrong)
- Cute names hide capability.
wordysounds like a single-purpose copywriter. It's actually doing narrative architecture. I've had to fight my own naming a few times. - The Council can stall. Four reviewers means four chances for someone to nitpick. I had to add a tiebreaker rule and a max-iterations cap.
- Spatial UIs don't scale linearly. Seventeen agents fits in one screen. Fifty wouldn't. At some point I'll need camera controls or floor levels. Tomorrow's problem.
- It's harder to debug than logs. When something breaks, you still want a flat text trace. The 3D view is for steady-state awareness, not postmortems. Build both.
What I'd tell someone copying this
- Pick a metaphor and commit. Office, dungeon, starship, kitchen — doesn't matter. Pick one and design every element to fit. Mixed metaphors are worse than no metaphor.
- Three states, not seven.
WORKING / REVIEW / DONEis enough. Every state you add is a bucket of edge cases. - Make idle visible. If your UI only renders activity, you'll never notice when nothing is happening — and "nothing happening" is the most common failure mode of agent systems.
- Reviewers should disagree by design. One reviewer is a stamp. Four reviewers with different lenses is a critique.
- The character matters more than the prompt. A vivid persona produces sharper prompts naturally. Start with "who is this agent" and the system prompt writes itself.
The takeaway
Agent orchestration isn't a software problem. It's a coordination problem — and coordination is something humans have been building tools for since cities. Org charts, war rooms, situation tables, kanban boards. They all encode the same idea: turn invisible parallel work into something you can point at.
A 3D voxel office is just the latest version of that. It's not serious. It's not enterprise. It's a toy.
But the toy lets me see seventeen agents work in parallel without losing the thread, and that's worth more than any dashboard I've ever built.
Related posts
- 1.My AI Agent Stopped Reading Files: What a Dual Knowledge Graph Actually Looks Like in Production2026-04-14 · 18 min
- 2.ANALYZE for Codebases: Giving Claude Code a Persistent Memory of Your Repo2026-04-11 · 11 min
- 3.I Built a CLAUDE.md Linter in One Session. Here's What I Found in 773 Sessions of Context Files.2026-04-04 · 6 min
