Research

Token Economics: From Cost Centers to Value Centers

By Agents Squads · · 8 min

The Cost Trap

The instinct is natural: tokens cost money, so minimize tokens.

This thinking is wrong.

Token minimization optimizes for the wrong metric. The right question isn’t “how few tokens can we use?” It’s “how much value can we create per token?”

The Real Math

Consider two agents solving the same problem:

Agent A (Token-Minimized)

Agent B (Context-Rich)

Agent A looks cheaper. But factor in rework:

Still close. Now factor in human time:

Agent B costs 4.7x less when you include human time.

The lesson: Token cost is noise. Outcome cost is signal.

Value Per Token

The metric that matters:

Value Per Token = Business Outcome Value / Tokens Consumed

For a customer service agent:

For every token, you’re generating $0.0033 in value. That’s a 220x return.

When value per token is high, the optimization target is throughput, not token reduction.

Model Selection Economics

Different models have different economics:

ModelInput Cost/MOutput Cost/MBest For
Claude Opus 4.5$15$75Complex reasoning, high-stakes
Claude Sonnet 4$3$15General production work
Claude Haiku 3.5$0.25$1.25High volume, simple tasks
GPT-4o$2.50$10Multimodal, fast iteration

The Model Arbitrage Strategy

Use expensive models for high-value decisions, cheap models for low-value operations:

Example: Code Review Pipeline

  1. Haiku scans for obvious issues (syntax, formatting): $0.001 per file
  2. Sonnet reviews logic and architecture: $0.02 per file
  3. Opus analyzes security-critical code: $0.10 per file

Total cost for reviewing 100 files:

vs. Opus for all 100 files: $10.00

8x cost reduction with equivalent or better outcomes, because each model operates where it provides best value.

Batch vs Interactive Economics

Token costs vary by usage pattern:

Interactive (Real-time)

Batch (Background)

Anthropic’s prompt caching reduces input token costs by 90% for repeated context. For batch operations with consistent prompts, this is transformative:

Without caching: $15/M tokens
With 90% cache hit: $1.50/M tokens effective

A nightly batch job running 1M tokens of analysis:

The Budget Framework

Structure AI spending like engineering infrastructure:

1. Fixed Costs (Predictable)

Budget these like any infrastructure cost. They should decrease as a percentage of value over time.

2. Variable Costs (Scales with Value)

These should scale with business value. More interactions = more cost = more value.

3. Investment Costs (Capability Building)

Treat these as R&D. Expect payback over quarters, not days.

Cost Anomaly Detection

Watch for these patterns:

Red Flags

PatternLikely CauseAction
Sudden 10x spikeAgent loopKill and investigate
Gradual daily increaseContext bloatReview prompts
High variation between runsInconsistent inputsStandardize
Cost increasing, outcomes flatModel inefficiencyRe-evaluate model choice

Optimization Triggers

When to optimize:

When NOT to optimize:

Don’t optimize working systems for marginal token savings.

The Agent Cost Stack

For multi-agent systems, costs compound:

Total Cost = Σ (Agent Tokens × Agent Cost Rate)
           + Coordination Overhead
           + Retry Costs
           + Human Escalation Costs

Coordination Overhead

Each agent handoff costs tokens:

For a 5-agent workflow with 4 handoffs:

Design implication: Fewer, capable agents often beat many specialized agents on cost efficiency.

When Multi-Agent Is Worth It

Multi-agent systems cost more in tokens but can deliver more value:

ScenarioSingle-AgentMulti-AgentWinner
Simple taskLower costHigher costSingle
Complex researchLower qualityHigher qualityMulti
Time-criticalSequentialParallelMulti
High-stakesOne perspectiveMultiple perspectivesMulti

The break-even: Multi-agent systems win when the quality/speed improvement exceeds the coordination overhead.

Pricing Your AI Products

If you’re building AI products, token economics affect pricing strategy:

Cost-Plus Pricing

Price = (Token Cost × Markup) + Fixed Costs

Value-Based Pricing

Price = (Value Delivered × Capture Rate)

Outcome-Based Pricing

Price = (Per Outcome Fee)

Sierra’s $100M ARR came from outcome-based pricing: pay per resolved ticket. They capture value, not cost.

Key Metrics Dashboard

Track these for AI economics:

MetricFormulaTarget
Cost per outcomeTotal tokens × rate / successful outcomesDecreasing
Value per tokenBusiness value / tokens consumedIncreasing
Model efficiencyOutcomes / model-specific tokensCompare across models
Retry rateRetried tasks / total tasks< 10%
Human escalation rateHuman interventions / total tasks< 5%

Summary

Token economics principles:

  1. Optimize for value, not cost: Value per token matters more than tokens consumed
  2. Model arbitrage: Use expensive models for high-value decisions only
  3. Batch when possible: Caching and batching dramatically reduce costs
  4. Measure outcomes: Cost per successful outcome, not cost per token
  5. Include human time: Token costs are often <5% of total cost including human time

The goal isn’t to minimize tokens. It’s to maximize value per token.


Related: Context Window Economics covers when to inject context vs discover via tools. Profitable AI analyzes companies successfully monetizing AI products.

Related Reading

Back to Research