
The Unit Economics of AI Agents: What It Actually Costs
AI agents are cheaper than you think if you architect them right. Context isolation is the key cost lever.
The naive cost model
Most teams estimate AI agent costs by multiplying token price by average conversation length. This often produces terrifying numbers, but only because the architecture is flawed.
Here's what a typical "just call the API" implementation looks like: you stuff everything into one context window. API documentation goes in. Raw JSON responses from every tool call go in. The entire conversation history, including every prior tool output, goes in. A single conversation can hit 50K+ tokens before you've done anything useful.
Typical single-context conversation:
System prompt .............. 2,000 tokens
API docs (loaded once) ..... 8,000 tokens
Raw JSON response #1 ...... 3,000 tokens
Raw JSON response #2 ...... 4,500 tokens
Pattern matching docs ...... 4,000 tokens
Conversation history ....... 6,000 tokens
User's latest message ...... 200 tokens
─────────────────────────────────────────
Total input per turn: ~28,000 tokens
Cost per turn (Claude): ~$0.08
Cost per 5-turn conversation: ~$0.50+At $0.50 per conversation, 2,000 conversations a month puts you at $1,000/month in inference costs alone. That's the number that makes CFOs say "let's wait on the AI thing."
The architecture that changes the math
The fix is context isolation via task agents. Instead of stuffing everything into one context window, you dispatch ephemeral workers. Each task agent loads the docs it needs, queries one system, interprets the raw response, and returns a 200-500 token summary. Then its context is discarded. The primary agent never sees raw JSON or API docs.
Each task agent is ephemeral. It loads the docs it needs, queries the system, interprets the response, returns a clean summary, and its context is discarded. The primary agent stays at 4-6K tokens even after processing 50K+ of raw data through task agents.
This isn't a minor optimization. It's a 5-10x reduction in cost per conversation. The primary agent's context stays small and focused, which means faster responses, better reasoning, and dramatically lower inference costs.
Real numbers
Here's what conversations actually cost with context isolation in place.
Cost per conversation by complexity
Compared to what?
The real question isn't "is $0.05 expensive?" It's "what does this replace?"
A sales ops analyst spending 20 minutes reviewing a pipeline costs the company roughly $15 in fully loaded labor. The agent does the same work in 30 seconds for $0.05. Even at 2,000 conversations per month, you're looking at $100/month in inference costs versus $30,000/month in human labor for the same throughput.
cheaper than the human equivalent at 2,000 conversations/month
$100/mo agent cost vs. $30,000/mo fully loaded labor
The agent is 300x cheaper. And it works at 2am. It doesn't take PTO. It doesn't context-switch between Slack and the dashboard. It doesn't forget the edge case it learned about last month.
When it doesn't make sense
Agents are not free. Honesty about when they don't make sense matters more than hype about when they do.
Simple lookups. If a database query answers the question, use a database query. An agent that wraps a SELECT statement is an expensive SELECT statement.
One-off tasks. If you do something once a quarter, the setup cost of an agent won't pay back. Just do it manually.
Physical world interaction. Agents operate in digital systems. If the task requires someone to walk to a server rack, an agent can tell you which rack, but it can't walk there.
The monthly cost model
Here's what real monthly inference spend looks like at different volumes.
Monthly inference cost by volume
Plus platform fee (varies by plan). Ranges reflect mix of simple and complex conversations.
The question isn't whether AI agents are affordable. It's whether you can afford to have humans doing work that an agent handles for pennies.
Amodal is an open-source agent runtime. Context isolation and task agent dispatch are built into the core.