The Context Problem
As Andrej Karpathy puts it: the LLM context window is like RAM—it requires careful curation, not maximization.
This matters more for multi-agent systems. Anthropic’s research shows multi-agent architectures can consume 15x more tokens than single-agent approaches. Without context engineering, costs spiral, performance degrades, and agents fail.
The goal isn’t to fill context windows. It’s to find the smallest possible set of high-signal tokens that maximize desired outcomes.
Context Failure Modes
Before discussing solutions, understand the problems:
| Failure Mode | What Happens |
|---|---|
| Context Rot | Model performance degrades as token count increases |
| Context Poisoning | Hallucinations enter stored information, compound over time |
| Context Distraction | Excessive information overwhelms the model |
| Context Confusion | Irrelevant content influences responses |
| Context Clash | Conflicting information within the same context |
These aren’t theoretical. We’ve observed all of them in production multi-agent systems. Context rot alone caused a 23% accuracy drop in one research agent operating above 60% context utilization.
The 40% Smart Zone
Claude Code triggers auto-compact at 95% context saturation. But performance degrades long before that threshold.
Based on our testing across 45 agents in 8 squads:
| Context Utilization | Performance Impact |
|---|---|
| 0-30% | Optimal reasoning quality |
| 30-40% | Smart Zone - good balance of context and reasoning |
| 40-60% | Noticeable degradation on complex tasks |
| 60-80% | Significant quality loss, more hallucinations |
| 80-95% | Severe degradation, unreliable outputs |
| 95%+ | Auto-compact triggers, context loss |
Recommendation: Design agents to operate in the 40% Smart Zone. Build in checkpoints and summarization before hitting 40%.
The Four Techniques
Based on Anthropic’s official guidance, context management falls into four categories.
1. Write Context (Externalize)
Pattern: Persist information outside the context window. Retrieve when needed.
| Approach | Implementation | Use Case |
|---|---|---|
| Scratchpads | Tool calls write to runtime state | Multi-step reasoning |
| Long-term memory | Agent synthesizes to storage | Cross-session learning |
| Todo lists | Progress trackers as files | Complex tasks |
| Notes files | NOTES.md, CLAUDE.md patterns | Project context |
In practice: Agents should write summaries to files rather than accumulating in context.
Instead of: Keep all findings in working memory
Do: Write findings to files, keep summary in context
A research agent investigating 10 sources should write each source analysis to a file, keeping only a 2-3 sentence summary in context. Total context: ~500 tokens instead of ~15,000.
2. Select Context (Retrieve)
Pattern: Dynamically fetch only relevant information at runtime.
| Approach | Implementation | Use Case |
|---|---|---|
| RAG retrieval | Embeddings + vector search | Large knowledge bases |
| Static files | CLAUDE.md loaded upfront | Project conventions |
| Just-in-time | Glob/grep during execution | Code exploration |
| Tool descriptions | RAG on tool docs | Large tool sets |
Evidence: RAG on tool descriptions showed 3x improvement in tool selection accuracy for agents with 20+ available tools.
In practice:
- Load CLAUDE.md files upfront (always relevant)
- Use file search for discovery, read only what’s needed
- Don’t pre-load “just in case”
3. Compress Context (Summarize)
Pattern: Condense information while preserving critical decisions.
| Approach | Threshold | Trade-off |
|---|---|---|
| Auto-compact | 95% context saturation | May lose subtle context |
| Recursive summarization | At agent boundaries | Compression artifacts |
| Context trimming | Remove older messages | Lost history |
| Tool result clearing | After processing | Safest approach |
In practice:
- Summarize after completing subtasks (2-3 sentences)
- Drop tool outputs after extracting conclusions
- Keep decisions and rationale, not raw data
Warning: Overly aggressive compression risks losing subtle but critical context. Test compression on representative tasks before deploying.
4. Isolate Context (Sub-agents)
Pattern: Specialized agents with clean, focused context windows.
| Approach | Benefit | Cost |
|---|---|---|
| Task-specific sub-agents | Deep focus | Coordination overhead |
| Parallel execution | More total tokens on problem | 15x token multiplier |
| Condensed handoffs | Clean interfaces | Information loss risk |
In practice:
- Sub-agents return summaries (1,000-2,000 tokens), not full results
- Each sub-agent focuses on one concern
- Parent coordinates, doesn’t duplicate work
Evidence: Splitting complex research across sub-agents (each with isolated context) significantly outperformed single-agent approaches—90% improvement on multi-source synthesis tasks.
Agent-Specific Budgets
Different agent types have different context needs:
| Agent Type | Context Target | Budget/Run | Timeout | Rationale |
|---|---|---|---|---|
| Monitor | < 20% | $0.50-1.00 | 5 min | Fetch → Report (focused) |
| Analyzer | < 30% | $1.00-2.00 | 10 min | Read upstream → Synthesize |
| Generator | < 40% | $2.00-5.00 | 15 min | Create artifacts (needs more context) |
| Orchestrator | < 25% | $2.00-3.00 | 15 min | Coordinate, don’t accumulate |
| Reviewer | < 30% | $1.00-2.00 | 5 min | Diff + rules (bounded input) |
Input/Output Patterns
Monitors (scheduled data fetching):
Inputs: Config only (no upstream context)
Outputs: Structured reports (markdown + JSON)
Context: Fresh each run
Analyzers (synthesis agents):
Inputs: Upstream data (bounded, recent only)
Outputs: Analysis + structured data
Context: Read 5 previous reports max
Orchestrators (lead agents):
Inputs: Briefs, requests
Outputs: Issues, coordination artifacts
Context: Pass minimal viable context to workers
Handoff Protocol
When passing context between agents, structure matters:
Good Handoff (Minimal Viable Context)
## Task
Investigate context engineering patterns for multi-agent systems
## Constraints
- Max 2 hours
- Focus on practical techniques
- Cite sources
## Context Summary
We're building production multi-agent systems. Need to understand
how to manage context across agents without degradation.
## Expected Output
Deep-dive document with evidence-backed recommendations
Total: ~150 tokens
Bad Handoff (Context Hoarding)
## Full Conversation History
[10,000 tokens of prior discussion]
## All Files Read
[5,000 tokens of file contents]
## Everything Just In Case
[3,000 tokens of tangentially related information]
Total: ~18,000 tokens
The bad handoff poisons the sub-agent’s context before it even starts working.
Warning Signs
Yellow Zone (30-40% context)
Watch for:
- Reading 5+ files without producing output
- Multiple large file reads in sequence
- Tool outputs accumulating without summarization
- Conversation going 10+ turns on same task
- Search returning large result sets being fully read
Action: Pause, summarize current state, consider spawning sub-agent.
Red Zone (>40% context)
Immediate actions:
- Stop accumulating
- Summarize current state
- Spawn fresh sub-agent with summary only
- Or: trigger manual checkpoint
Anti-Patterns
1. Context Hoarding
Pattern: Reading files “just in case” Fix: Only read what you need now
2. History Dependency
Pattern: Relying on “what we discussed earlier” Fix: State it directly or write to external file
3. Output Verbosity
Pattern: Including full file contents in responses Fix: Summaries with file references
4. Tool Output Accumulation
Pattern: Running many tools without processing results Fix: Process → summarize → proceed
5. Bloated Tool Sets
Pattern: Tools with overlapping functionality Fix: Minimal viable tool set, unambiguous selection
Measuring Context Efficiency
Track these metrics:
| Metric | Target | How to Measure |
|---|---|---|
| Context utilization | < 40% average | Trace analysis |
| Cost per outcome | Decreasing trend | Budget tracking |
| Sub-agent spawn rate | 20-30% of complex tasks | Execution logs |
| Handoff token size | < 2,000 tokens | Trace analysis |
| Compression ratio | 10:1 for tool outputs | Before/after comparison |
Implementation Checklist
Agent Design
- Context target defined (% of window)
- Token budget set (per run)
- Inputs are bounded and specific
- Outputs are structured and summarized
- Tools are minimal and unambiguous
During Execution
- Reading only necessary files
- Summarizing after subtasks
- Dropping tool outputs after processing
- Spawning sub-agents for deep work
- Writing to external files for persistence
Handoffs
- Passing conclusions, not raw data
- Specifying constraints clearly
- Defining expected output format
- Limiting scope to single concern
The Economics
Context engineering isn’t just about quality—it’s about cost.
At $3-15 per million tokens:
- An agent running at 80% context uses 2x the tokens of one at 40%
- Multi-agent systems multiply this across every agent
- Inefficient handoffs compound costs exponentially
A well-engineered multi-agent system operating in the Smart Zone can cost 60-70% less than an equivalent unoptimized system while producing better results.
The 40% Smart Zone isn’t just optimal for reasoning—it’s optimal for economics.
Summary
| Technique | When to Use | Token Savings |
|---|---|---|
| Write (Externalize) | Multi-step reasoning, cross-session | 50-80% |
| Select (Retrieve) | Large knowledge bases, many tools | 30-60% |
| Compress (Summarize) | After subtasks, tool outputs | 40-70% |
| Isolate (Sub-agents) | Complex tasks, parallel work | Enables 15x parallelization |
Context engineering is the discipline of curating tokens, not maximizing them. Multi-agent systems make this critical—and make the payoff substantial.
Sources: Anthropic Engineering (Effective Context Engineering for AI Agents, 2025), LangChain (Context Engineering for Agents), Chroma Research (Context Rot), internal analysis of 45 agents across 8 squads.