Token Economics: From Cost Centers to Value Centers

The Cost Trap

The instinct is natural: tokens cost money, so minimize tokens.

This thinking is wrong.

Token minimization optimizes for the wrong metric. The right question isn’t “how few tokens can we use?” It’s “how much value can we create per token?”

The Real Math

Consider two agents solving the same problem:

Agent A (Token-Minimized)

Tokens used: 10,000
Cost: $0.15
Success rate: 60%
Rework required: 40% of tasks
Effective cost per successful outcome: $0.25

Agent B (Context-Rich)

Tokens used: 25,000
Cost: $0.375
Success rate: 95%
Rework required: 5% of tasks
Effective cost per successful outcome: $0.40

Agent A looks cheaper. But factor in rework:

Agent A: $0.25 + (0.4 × $0.25) = $0.35 effective cost
Agent B: $0.40 + (0.05 × $0.40) = $0.42 effective cost

Still close. Now factor in human time:

Agent A rework requires 15 minutes human review per failure
At $100/hour engineering cost, that’s $10 per rework
Agent A effective cost: $0.35 + (0.4 × $10) = $4.35
Agent B effective cost: $0.42 + (0.05 × $10) = $0.92

Agent B costs 4.7x less when you include human time.

The lesson: Token cost is noise. Outcome cost is signal.

Value Per Token

The metric that matters:

Value Per Token = Business Outcome Value / Tokens Consumed

For a customer service agent:

Resolved ticket value: $50 (cost of human agent resolution avoided)
Tokens consumed: 15,000
Cost at $10/M tokens: $0.15
Value per token: $50 / 15,000 = $0.0033

For every token, you’re generating $0.0033 in value. That’s a 220x return.

When value per token is high, the optimization target is throughput, not token reduction.

Model Selection Economics

Different models have different economics:

Model	Input Cost/M	Output Cost/M	Best For
Claude Opus 4.5	$15	$75	Complex reasoning, high-stakes
Claude Sonnet 4	$3	$15	General production work
Claude Haiku 3.5	$0.25	$1.25	High volume, simple tasks
GPT-4o	$2.50	$10	Multimodal, fast iteration

The Model Arbitrage Strategy

Use expensive models for high-value decisions, cheap models for low-value operations:

Example: Code Review Pipeline

Haiku scans for obvious issues (syntax, formatting): $0.001 per file
Sonnet reviews logic and architecture: $0.02 per file
Opus analyzes security-critical code: $0.10 per file

Total cost for reviewing 100 files:

Haiku (all 100): $0.10
Sonnet (30 flagged): $0.60
Opus (5 security-sensitive): $0.50
Total: $1.20

vs. Opus for all 100 files: $10.00

8x cost reduction with equivalent or better outcomes, because each model operates where it provides best value.

Batch vs Interactive Economics

Token costs vary by usage pattern:

Interactive (Real-time)

User waiting for response
Latency critical
Often willing to pay premium
Example: Customer service chat

Batch (Background)

No user waiting
Can use prompt caching aggressively
Can retry on cheaper models first
Example: Nightly report generation

Anthropic’s prompt caching reduces input token costs by 90% for repeated context. For batch operations with consistent prompts, this is transformative:

Without caching: $15/M tokens
With 90% cache hit: $1.50/M tokens effective

A nightly batch job running 1M tokens of analysis:

Without caching: $15
With caching: $1.50
Monthly savings: $405

The Budget Framework

Structure AI spending like engineering infrastructure:

1. Fixed Costs (Predictable)

Monitoring agents (scheduled runs)
Batch processing (daily/weekly reports)
Integration maintenance

Budget these like any infrastructure cost. They should decrease as a percentage of value over time.

2. Variable Costs (Scales with Value)

Customer-facing agents (per interaction)
Development assistance (per developer)
Research tasks (per investigation)

These should scale with business value. More interactions = more cost = more value.

3. Investment Costs (Capability Building)

Training data generation
New agent development
Model fine-tuning

Treat these as R&D. Expect payback over quarters, not days.

Cost Anomaly Detection

Watch for these patterns:

Red Flags

Pattern	Likely Cause	Action
Sudden 10x spike	Agent loop	Kill and investigate
Gradual daily increase	Context bloat	Review prompts
High variation between runs	Inconsistent inputs	Standardize
Cost increasing, outcomes flat	Model inefficiency	Re-evaluate model choice

Optimization Triggers

When to optimize:

Cost per outcome increasing 3+ months
Token efficiency < 50% of comparable tasks
Human intervention rate > 20%

When NOT to optimize:

Outcomes are excellent
Cost is within budget
System is stable

Don’t optimize working systems for marginal token savings.

The Agent Cost Stack

For multi-agent systems, costs compound:

Total Cost = Σ (Agent Tokens × Agent Cost Rate)
           + Coordination Overhead
           + Retry Costs
           + Human Escalation Costs

Coordination Overhead

Each agent handoff costs tokens:

Handoff context: 500-2,000 tokens per handoff
Routing decisions: 200-500 tokens per decision
Result summarization: 300-1,000 tokens per summary

For a 5-agent workflow with 4 handoffs:

Minimum overhead: 4 × 1,000 = 4,000 tokens
Typical overhead: 4 × 2,500 = 10,000 tokens

Design implication: Fewer, capable agents often beat many specialized agents on cost efficiency.

When Multi-Agent Is Worth It

Multi-agent systems cost more in tokens but can deliver more value:

Scenario	Single-Agent	Multi-Agent	Winner
Simple task	Lower cost	Higher cost	Single
Complex research	Lower quality	Higher quality	Multi
Time-critical	Sequential	Parallel	Multi
High-stakes	One perspective	Multiple perspectives	Multi

The break-even: Multi-agent systems win when the quality/speed improvement exceeds the coordination overhead.

Pricing Your AI Products

If you’re building AI products, token economics affect pricing strategy:

Cost-Plus Pricing

Price = (Token Cost × Markup) + Fixed Costs

Predictable margins
Doesn’t capture value
Vulnerable to model price drops

Value-Based Pricing

Price = (Value Delivered × Capture Rate)

Aligns with customer outcomes
Higher margins possible
Requires measuring value

Outcome-Based Pricing

Price = (Per Outcome Fee)

Customer pays for results
You bear efficiency risk
Highest trust signal

Sierra’s $100M ARR came from outcome-based pricing: pay per resolved ticket. They capture value, not cost.

Key Metrics Dashboard

Track these for AI economics:

Metric	Formula	Target
Cost per outcome	Total tokens × rate / successful outcomes	Decreasing
Value per token	Business value / tokens consumed	Increasing
Model efficiency	Outcomes / model-specific tokens	Compare across models
Retry rate	Retried tasks / total tasks	< 10%
Human escalation rate	Human interventions / total tasks	< 5%

Summary

Token economics principles:

Optimize for value, not cost: Value per token matters more than tokens consumed
Model arbitrage: Use expensive models for high-value decisions only
Batch when possible: Caching and batching dramatically reduce costs
Measure outcomes: Cost per successful outcome, not cost per token
Include human time: Token costs are often <5% of total cost including human time

The goal isn’t to minimize tokens. It’s to maximize value per token.

Related: Context Window Economics covers when to inject context vs discover via tools. Profitable AI analyzes companies successfully monetizing AI products.