Short answer. Multi-agent systems cost far more because every sub-agent is a full agent that loads its own context and runs its own tool loop, so tokens multiply by the number of workers plus coordination overhead. That spend pays off only on broad, high-value tasks. Default to a single agent, reserve multi-agent for genuine breadth, and the bill stays tied to value instead of agent count.
One task fans into many parallel agents, each re-carrying its own full context. The tokens multiply with the workers. Hero image.
Key facts.
- Multi-agent is a token amplifier by design: Anthropic's orchestrator-worker research system used about 15 times the tokens of a chat and roughly 4 times a single agent (Anthropic, How we built our multi-agent research system, 2025).
- The spend buys real capability on the right task: that system beat a single Claude Opus 4 by 90.2% on breadth-first research, and token usage alone explained about 80% of the performance variance (Anthropic, 2025).
- More agents are not free coordination: each sub-agent re-processes its own full context and tool loop, and the orchestrator pays again to brief and synthesize them, which is where the multiplication comes from (Anthropic, 2025).
Why does multi-agent cost so much more?
Because every sub-agent is a full agent, not a cheap helper. When an orchestrator spins up workers, each one loads its own context window, runs its own tool-use loop, and generates its own reasoning trace, so the per-token work is replicated across all of them in parallel. On top of that, the orchestrator pays to brief each worker, and pays again to gather and synthesize their results. So the bill is not the single-agent cost plus a little overhead. It is roughly the single-agent cost times the number of workers, plus coordination. Anthropic measured the shape of this: their research system ran at about 15 times the tokens of a normal chat and around 4 times a single agent. The multiplication is structural, not a tuning problem.
What do you get for the 15x?
On the right task, a large capability jump. The same Anthropic system beat a single Claude Opus 4 by 90.2% on breadth-first research, the kind of task where you need to explore several independent directions at once and a lone agent is the bottleneck. The striking finding was that token usage alone explained about 80% of the performance variance, meaning most of the gain came simply from spending more tokens across parallel exploration. That is the honest trade: multi-agent converts a large token budget into breadth and depth a single agent cannot reach in one pass. When the task is genuinely broad and high-value, the spend is justified. When it is not, you are paying the multiplier for nothing.
The token flow: one task splits into parallel sub-agents that each widen with their own context, then merge, a far larger total than a single agent. Diagram.
When is multi-agent not worth it?
When the task is not broad, which is most of the time. Cognition's "Don't Build Multi-Agents" argues that for the majority of agent tasks, splitting work across agents fragments context and creates more coordination failures than it solves, and that a single-threaded agent is the more reliable default (Walden Yan, 2025). The economics agree: if a task does not need parallel breadth, the extra agents mostly re-do work and inflate the bill without buying capability. Multi-agent earns its 15x on open-ended research and broad exploration. For a linear task, a routine workflow, or anything one agent can do in sequence, the multiplier is pure cost. The decision is not architectural fashion. It is whether the task's value clears the token cost of running it in parallel.
How do you keep the bill sane?
| Lever | What it does |
|---|---|
| Default to one agent | Use multi-agent only for genuinely broad, high-value tasks |
| Cheaper models for workers | Run sub-agents on a smaller model; reserve the big model for the orchestrator |
| Bound the fan-out | Cap how many workers spawn; more agents is not more answer |
| Compress handoffs | Pass summaries, not full transcripts, between orchestrator and workers |
| Cache shared context | Cache the common prefix so workers do not each re-pay for it |
| Measure value vs cost | Track tokens per task against the task's worth, and kill the pattern where it loses |
The pattern is that a multi-agent system multiplies tokens by design, and that spend only pays off when the task is broad enough to need parallel exploration. Default to a single agent, reserve multi-agent for high-value breadth, put workers on cheaper models, bound the fan-out, and cache the shared context. None of that is a bigger model. It is knowing which tasks deserve the multiplier and which are paying it for nothing, which is what VibeModel builds as the Pattern Intelligence Layer.
Frequently asked questions
Isn't multi-agent just better?
Only on the right task. It beat a single agent by 90.2% on breadth-first research, but that is the case it is built for. On linear or routine tasks it mostly re-does work and inflates cost, and a single-threaded agent is more reliable.
Why does it cost 15x and not 2x?
Because each sub-agent re-processes a full context and tool loop in parallel, and the orchestrator pays to brief and synthesize on top. The cost scales with the number of workers plus coordination, not with the size of the task.
When should I actually use multi-agent?
When the task is broad enough to need several independent lines of exploration at once, and valuable enough to justify the token spend. For anything one agent can do in sequence, stay single-agent.

