Research

Context Engineering for Multi-Agent Systems

By Agents Squads · · 11 min

The Context Problem

As Andrej Karpathy puts it: the LLM context window is like RAM—it requires careful curation, not maximization.

This matters more for multi-agent systems. Anthropic’s research shows multi-agent architectures can consume 15x more tokens than single-agent approaches. Without context engineering, costs spiral, performance degrades, and agents fail.

The goal isn’t to fill context windows. It’s to find the smallest possible set of high-signal tokens that maximize desired outcomes.

Context Failure Modes

Before discussing solutions, understand the problems:

Failure ModeWhat Happens
Context RotModel performance degrades as token count increases
Context PoisoningHallucinations enter stored information, compound over time
Context DistractionExcessive information overwhelms the model
Context ConfusionIrrelevant content influences responses
Context ClashConflicting information within the same context

These aren’t theoretical. We’ve observed all of them in production multi-agent systems. Context rot alone caused a 23% accuracy drop in one research agent operating above 60% context utilization.

The 40% Smart Zone

Claude Code triggers auto-compact at 95% context saturation. But performance degrades long before that threshold.

Based on our testing across 45 agents in 8 squads:

Context UtilizationPerformance Impact
0-30%Optimal reasoning quality
30-40%Smart Zone - good balance of context and reasoning
40-60%Noticeable degradation on complex tasks
60-80%Significant quality loss, more hallucinations
80-95%Severe degradation, unreliable outputs
95%+Auto-compact triggers, context loss

Recommendation: Design agents to operate in the 40% Smart Zone. Build in checkpoints and summarization before hitting 40%.

The Four Techniques

Based on Anthropic’s official guidance, context management falls into four categories.

1. Write Context (Externalize)

Pattern: Persist information outside the context window. Retrieve when needed.

ApproachImplementationUse Case
ScratchpadsTool calls write to runtime stateMulti-step reasoning
Long-term memoryAgent synthesizes to storageCross-session learning
Todo listsProgress trackers as filesComplex tasks
Notes filesNOTES.md, CLAUDE.md patternsProject context

In practice: Agents should write summaries to files rather than accumulating in context.

Instead of: Keep all findings in working memory
Do: Write findings to files, keep summary in context

A research agent investigating 10 sources should write each source analysis to a file, keeping only a 2-3 sentence summary in context. Total context: ~500 tokens instead of ~15,000.

2. Select Context (Retrieve)

Pattern: Dynamically fetch only relevant information at runtime.

ApproachImplementationUse Case
RAG retrievalEmbeddings + vector searchLarge knowledge bases
Static filesCLAUDE.md loaded upfrontProject conventions
Just-in-timeGlob/grep during executionCode exploration
Tool descriptionsRAG on tool docsLarge tool sets

Evidence: RAG on tool descriptions showed 3x improvement in tool selection accuracy for agents with 20+ available tools.

In practice:

3. Compress Context (Summarize)

Pattern: Condense information while preserving critical decisions.

ApproachThresholdTrade-off
Auto-compact95% context saturationMay lose subtle context
Recursive summarizationAt agent boundariesCompression artifacts
Context trimmingRemove older messagesLost history
Tool result clearingAfter processingSafest approach

In practice:

Warning: Overly aggressive compression risks losing subtle but critical context. Test compression on representative tasks before deploying.

4. Isolate Context (Sub-agents)

Pattern: Specialized agents with clean, focused context windows.

ApproachBenefitCost
Task-specific sub-agentsDeep focusCoordination overhead
Parallel executionMore total tokens on problem15x token multiplier
Condensed handoffsClean interfacesInformation loss risk

In practice:

Evidence: Splitting complex research across sub-agents (each with isolated context) significantly outperformed single-agent approaches—90% improvement on multi-source synthesis tasks.

Agent-Specific Budgets

Different agent types have different context needs:

Agent TypeContext TargetBudget/RunTimeoutRationale
Monitor< 20%$0.50-1.005 minFetch → Report (focused)
Analyzer< 30%$1.00-2.0010 minRead upstream → Synthesize
Generator< 40%$2.00-5.0015 minCreate artifacts (needs more context)
Orchestrator< 25%$2.00-3.0015 minCoordinate, don’t accumulate
Reviewer< 30%$1.00-2.005 minDiff + rules (bounded input)

Input/Output Patterns

Monitors (scheduled data fetching):

Inputs: Config only (no upstream context)
Outputs: Structured reports (markdown + JSON)
Context: Fresh each run

Analyzers (synthesis agents):

Inputs: Upstream data (bounded, recent only)
Outputs: Analysis + structured data
Context: Read 5 previous reports max

Orchestrators (lead agents):

Inputs: Briefs, requests
Outputs: Issues, coordination artifacts
Context: Pass minimal viable context to workers

Handoff Protocol

When passing context between agents, structure matters:

Good Handoff (Minimal Viable Context)

## Task
Investigate context engineering patterns for multi-agent systems

## Constraints
- Max 2 hours
- Focus on practical techniques
- Cite sources

## Context Summary
We're building production multi-agent systems. Need to understand
how to manage context across agents without degradation.

## Expected Output
Deep-dive document with evidence-backed recommendations

Total: ~150 tokens

Bad Handoff (Context Hoarding)

## Full Conversation History
[10,000 tokens of prior discussion]

## All Files Read
[5,000 tokens of file contents]

## Everything Just In Case
[3,000 tokens of tangentially related information]

Total: ~18,000 tokens

The bad handoff poisons the sub-agent’s context before it even starts working.

Warning Signs

Yellow Zone (30-40% context)

Watch for:

Action: Pause, summarize current state, consider spawning sub-agent.

Red Zone (>40% context)

Immediate actions:

  1. Stop accumulating
  2. Summarize current state
  3. Spawn fresh sub-agent with summary only
  4. Or: trigger manual checkpoint

Anti-Patterns

1. Context Hoarding

Pattern: Reading files “just in case” Fix: Only read what you need now

2. History Dependency

Pattern: Relying on “what we discussed earlier” Fix: State it directly or write to external file

3. Output Verbosity

Pattern: Including full file contents in responses Fix: Summaries with file references

4. Tool Output Accumulation

Pattern: Running many tools without processing results Fix: Process → summarize → proceed

5. Bloated Tool Sets

Pattern: Tools with overlapping functionality Fix: Minimal viable tool set, unambiguous selection

Measuring Context Efficiency

Track these metrics:

MetricTargetHow to Measure
Context utilization< 40% averageTrace analysis
Cost per outcomeDecreasing trendBudget tracking
Sub-agent spawn rate20-30% of complex tasksExecution logs
Handoff token size< 2,000 tokensTrace analysis
Compression ratio10:1 for tool outputsBefore/after comparison

Implementation Checklist

Agent Design

During Execution

Handoffs

The Economics

Context engineering isn’t just about quality—it’s about cost.

At $3-15 per million tokens:

A well-engineered multi-agent system operating in the Smart Zone can cost 60-70% less than an equivalent unoptimized system while producing better results.

The 40% Smart Zone isn’t just optimal for reasoning—it’s optimal for economics.


Summary

TechniqueWhen to UseToken Savings
Write (Externalize)Multi-step reasoning, cross-session50-80%
Select (Retrieve)Large knowledge bases, many tools30-60%
Compress (Summarize)After subtasks, tool outputs40-70%
Isolate (Sub-agents)Complex tasks, parallel workEnables 15x parallelization

Context engineering is the discipline of curating tokens, not maximizing them. Multi-agent systems make this critical—and make the payoff substantial.


Sources: Anthropic Engineering (Effective Context Engineering for AI Agents, 2025), LangChain (Context Engineering for Agents), Chroma Research (Context Rot), internal analysis of 45 agents across 8 squads.

Related Reading

Back to Research