Going multi-agent can multiply your token bill

More agents is not a free upgrade. Each sub-agent is a full agent that re-reads its own context and runs its own loop, so cost scales with the number of workers, not with the task.

B

Balagei G Nagarajan

5 MIN READ


Short answer. Multi-agent systems cost far more because every sub-agent is a full agent that loads its own context and runs its own tool loop, so tokens multiply by the number of workers plus coordination overhead. That spend pays off only on broad, high-value tasks. Default to a single agent, reserve multi-agent for genuine breadth, and the bill stays tied to value instead of agent count.

One task at the center splitting into many parallel glowing agent streams, each carrying its own heavy load of tokens, fanning outward and rejoining

One task fans into many parallel agents, each re-carrying its own full context. The tokens multiply with the workers. Hero image.

Key facts.

  • Multi-agent is a token amplifier by design: Anthropic's orchestrator-worker research system used about 15 times the tokens of a chat and roughly 4 times a single agent (Anthropic, How we built our multi-agent research system, 2025).
  • The spend buys real capability on the right task: that system beat a single Claude Opus 4 by 90.2% on breadth-first research, and token usage alone explained about 80% of the performance variance (Anthropic, 2025).
  • More agents are not free coordination: each sub-agent re-processes its own full context and tool loop, and the orchestrator pays again to brief and synthesize them, which is where the multiplication comes from (Anthropic, 2025).

Why does multi-agent cost so much more?

Because every sub-agent is a full agent, not a cheap helper. When an orchestrator spins up workers, each one loads its own context window, runs its own tool-use loop, and generates its own reasoning trace, so the per-token work is replicated across all of them in parallel. On top of that, the orchestrator pays to brief each worker, and pays again to gather and synthesize their results. So the bill is not the single-agent cost plus a little overhead. It is roughly the single-agent cost times the number of workers, plus coordination. Anthropic measured the shape of this: their research system ran at about 15 times the tokens of a normal chat and around 4 times a single agent. The multiplication is structural, not a tuning problem.

What do you get for the 15x?

On the right task, a large capability jump. The same Anthropic system beat a single Claude Opus 4 by 90.2% on breadth-first research, the kind of task where you need to explore several independent directions at once and a lone agent is the bottleneck. The striking finding was that token usage alone explained about 80% of the performance variance, meaning most of the gain came simply from spending more tokens across parallel exploration. That is the honest trade: multi-agent converts a large token budget into breadth and depth a single agent cannot reach in one pass. When the task is genuinely broad and high-value, the spend is justified. When it is not, you are paying the multiplier for nothing.

A sankey-style flow where one task token stream splits into several thick parallel sub-agent streams that each widen with their own context and tool tokens, then merge into a synthesis stream much larger than the input

The token flow: one task splits into parallel sub-agents that each widen with their own context, then merge, a far larger total than a single agent. Diagram.

When is multi-agent not worth it?

When the task is not broad, which is most of the time. Cognition's "Don't Build Multi-Agents" argues that for the majority of agent tasks, splitting work across agents fragments context and creates more coordination failures than it solves, and that a single-threaded agent is the more reliable default (Walden Yan, 2025). The economics agree: if a task does not need parallel breadth, the extra agents mostly re-do work and inflate the bill without buying capability. Multi-agent earns its 15x on open-ended research and broad exploration. For a linear task, a routine workflow, or anything one agent can do in sequence, the multiplier is pure cost. The decision is not architectural fashion. It is whether the task's value clears the token cost of running it in parallel.

How do you keep the bill sane?

LeverWhat it does
Default to one agentUse multi-agent only for genuinely broad, high-value tasks
Cheaper models for workersRun sub-agents on a smaller model; reserve the big model for the orchestrator
Bound the fan-outCap how many workers spawn; more agents is not more answer
Compress handoffsPass summaries, not full transcripts, between orchestrator and workers
Cache shared contextCache the common prefix so workers do not each re-pay for it
Measure value vs costTrack tokens per task against the task's worth, and kill the pattern where it loses

The pattern is that a multi-agent system multiplies tokens by design, and that spend only pays off when the task is broad enough to need parallel exploration. Default to a single agent, reserve multi-agent for high-value breadth, put workers on cheaper models, bound the fan-out, and cache the shared context. None of that is a bigger model. It is knowing which tasks deserve the multiplier and which are paying it for nothing, which is what VibeModel builds as the Pattern Intelligence Layer.

Frequently asked questions

Isn't multi-agent just better?
Only on the right task. It beat a single agent by 90.2% on breadth-first research, but that is the case it is built for. On linear or routine tasks it mostly re-does work and inflates cost, and a single-threaded agent is more reliable.

Why does it cost 15x and not 2x?
Because each sub-agent re-processes a full context and tool loop in parallel, and the orchestrator pays to brief and synthesize on top. The cost scales with the number of workers plus coordination, not with the size of the task.

When should I actually use multi-agent?
When the task is broad enough to need several independent lines of exploration at once, and valuable enough to justify the token spend. For anything one agent can do in sequence, stay single-agent.


Share this post

Join the discussion

Have a take, a war story, or a question? Sign in with GitHub to comment and react. Comments are powered by GitHub Discussions, ad-free and yours to moderate.

Continue Reading

Find where your agent breaks, before you build it

Faultmap maps where your agent will fail from the goal and your data, then hands you the first test suite it has to pass.