TL;DR — Token minimization is the wrong optimization target. Injecting ~870 tokens of upfront context saves 2-3 tool calls and 500+ thinking tokens per session. Optimize for value per token, not minimum tokens.
The Hidden Cost Model
Every token in a context window has a cost—not just monetary, but computational. We cover the broader principles in our guide to context optimization. At $3-15 per million tokens (depending on model and direction), the naive approach is to minimize tokens.
But token minimization is the wrong optimization target — a lesson that becomes even more critical in multi-agent systems where token waste compounds across every agent.
The real question: when does injecting information upfront save more than it costs?
The Basic Math
Consider two approaches to providing an AI agent with project state. Injecting it upfront costs roughly 870 tokens and is instant. Letting the agent discover it via tool calls costs about 920 tokens plus latency. Token cost is roughly equivalent. So why does injection often win?
The hidden factor is relevance rate — how often the injected information actually gets used.
The value of injecting context equals the tokens saved (multiplied by how often the info gets used) minus the tokens spent injecting it. For a status command that costs 870 tokens with 80% session usage, the pure token math comes out slightly negative. But this ignores latency.
Each tool call adds an API roundtrip — typically 200-500ms. The agent also spends “thinking tokens” deciding whether to check state. When you account for these factors, upfront injection often wins despite the token cost.
The Numbers — Injecting state upfront: ~870 tokens, instant. Agent self-discovery via tool calls: ~920 tokens + 200-500ms latency per roundtrip + thinking tokens. Near-identical token cost, but injection wins on total efficiency.
When High-Density Injection Wins
Inject upfront when:
- Information is used in >70% of sessions
- It eliminates “what’s the current state?” discovery phases
- It reduces multiple tool calls to zero
- The agent would otherwise waste thinking tokens on “should I check X?”
Real example: A session status summary (squad states, recent activity, active goals) gets referenced in nearly every interaction. The 870 tokens of upfront context saves 2-3 tool calls and 500+ thinking tokens per session.
When It Loses
Avoid upfront injection when:
- Information is used in <30% of sessions
- One-size-fits-all injection for diverse task types
- Static snapshot when fresh data is critically important
- The information is easily discoverable via targeted queries
Real example: Full project history (10,000+ tokens) when most sessions only need recent commits. Better to query on demand.
Practical Measurements
We measured actual token costs for common context injections:
| Context Type | Chars | Tokens | Use Case |
|---|---|---|---|
| Minimal status | ~800 | ~200 | Session hooks (always) |
| Full status | 3,493 | ~870 | Most sessions |
| Full dashboard | 9,367 | ~2,340 | Deep analysis |
| Project CLAUDE.md | 8,000 | ~2,000 | Always relevant |
| Full codebase index | 40,000+ | ~10,000 | Rarely needed upfront |
At session start, ~970 tokens of context represents less than 1% of a 200K token window. That’s cheap insurance against discovery overhead.
Our Data — A session status summary (squad states, recent activity, active goals) at ~870 tokens gets referenced in nearly every interaction. That’s less than 1% of a 200K token window — cheap insurance against discovery overhead.
Progressive Density Strategy
The optimal approach isn’t “inject everything” or “inject nothing” — it’s progressive density based on relevance probability.
At the lightest level, always inject squad names, activity flags, and critical state — about 200 tokens with 100% relevance. For most sessions, bump that up to full status, recent goals, and active work at around 870 tokens. Reserve the full load — complete history, all memory, deep context at 2,340+ tokens — for specific deep-analysis tasks.
The key insight: don’t optimize globally — optimize per session type.
The Full Picture
The real calculation accounts for more than raw tokens. You’re also saving latency (each avoided API roundtrip is 200-500ms), thinking tokens (the overhead the agent spends deciding whether to query), and the compound effect across sessions.
For interactive sessions, latency dominates — a 500ms roundtrip feels slow and disrupts flow. For background automation running overnight, token cost dominates instead. The optimization target shifts depending on how the agent is being used.
Applying This to Agent Design
Prompt Engineering
Structure prompts with relevance-aware sections:
## Context (Always Relevant)
{minimal_state}
## Extended Context (If Needed)
{full_state if complex_task else "Use tools to query"}
## Task-Specific Context
{injected only for matching task types}
Tool Descriptions
High-density descriptions for frequently-used tools pay off:
{
"name": "search_codebase",
"description": "Semantic search across all source files. Returns top 10 matches with surrounding context. Use for: finding implementations, understanding patterns, locating related code. Prefer over file reads when location unknown."
}
Longer description (~50 tokens) saves thinking tokens deciding which tool to use.
Memory Loading
Load memory progressively:
Session start: Active goals, recent decisions (500 tokens)
On research task: Full topic memory (2,000 tokens)
On complex analysis: Everything relevant (5,000+ tokens)
Don’t load the full knowledge base for a simple commit message.
Key Takeaway — Don’t optimize globally — optimize per session type. For interactive sessions, latency dominates. For background automation, token cost dominates. The optimization target shifts depending on how the agent is used.
The Trap of Token Minimization
Teams optimizing for minimum tokens often create slower, more expensive agents. An agent that doesn’t have context spends tokens deciding what context it needs, burns latency calling tools to discover it, may miss relevant information due to incomplete discovery, and repeats the whole process across sessions.
An agent with appropriate upfront context starts working immediately, references what it needs without tool calls, completes tasks faster with fewer total tokens overall, and maintains coherence across interactions.
The goal isn’t minimum tokens — it’s maximum value per token.
How to Measure This
Track which injected context actually gets referenced — that’s your real usage rate. Count the tool calls your agents make just to learn about their environment. Compare task completion times across different context levels. Include thinking tokens, discovery tokens, and injected tokens in your total cost calculations.
The numbers will tell you where your agents are wasting effort and where a little upfront context goes a long way. Optimize for relevance first, then density.
Note: Token estimates based on Claude tokenization. GPT and other models may vary by 10-20%. The principles apply regardless of specific counts.