Thinking

The Unit Economics of AI Agents: What It Actually Costs

AI agents are cheaper than you think if you architect them right. Context isolation is the key cost lever.

Amodal TeamMarch 20, 20267 min read

The naive cost model

Most teams estimate AI agent costs by multiplying token price by average conversation length. This often produces terrifying numbers, but only because the architecture is flawed.

Here's what a typical "just call the API" implementation looks like: you stuff everything into one context window. API documentation goes in. Raw JSON responses from every tool call go in. The entire conversation history, including every prior tool output, goes in. A single conversation can hit 50K+ tokens before you've done anything useful.

Typical single-context conversation:

System prompt .............. 2,000 tokens
API docs (loaded once) ..... 8,000 tokens
Raw JSON response #1 ...... 3,000 tokens
Raw JSON response #2 ...... 4,500 tokens
Pattern matching docs ...... 4,000 tokens
Conversation history ....... 6,000 tokens
User's latest message ...... 200 tokens
─────────────────────────────────────────
Total input per turn: ~28,000 tokens
Cost per turn (Claude): ~$0.08
Cost per 5-turn conversation: ~$0.50+

At $0.50 per conversation, 2,000 conversations a month puts you at $1,000/month in inference costs alone. That's the number that makes CFOs say "let's wait on the AI thing."

The problem

That's not the cost of AI agents. That's the cost of doing it wrong.

The architecture that changes the math

The fix is context isolation via task agents. Instead of stuffing everything into one context window, you dispatch ephemeral workers. Each task agent loads the docs it needs, queries one system, interprets the raw response, and returns a 200-500 token summary. Then its context is discarded. The primary agent never sees raw JSON or API docs.

Without task agents

System prompt2K

API docs8K

Raw JSON #13K

Raw JSON #24.5K

Pattern docs4K

Conversation6K

Total28K+ and growing

With task agents

System prompt2K

User message200

Task result #1200

Task result #2250

Task result #3150

spacer

Total~3K, clean and focused

Each task agent is ephemeral. It loads the docs it needs, queries the system, interprets the response, returns a clean summary, and its context is discarded. The primary agent stays at 4-6K tokens even after processing 50K+ of raw data through task agents.

This isn't a minor optimization. It's a 5-10x reduction in cost per conversation. The primary agent's context stays small and focused, which means faster responses, better reasoning, and dramatically lower inference costs.

Real numbers

Here's what conversations actually cost with context isolation in place.

Cost per conversation by complexity

Complexity

Turns

Task agents

Total tokens

Cost

Simple Q&A

~3K

$0.02

Standard investigation

~15K

$0.05

Complex multi-system

~40K

$0.08

Deep dive (parallel)

~60K

$0.12

Key insight

The primary agent context stays under 6K in all four scenarios. The difference is how many task agents get dispatched — and their contexts are discarded after each one.

Compared to what?

The real question isn't "is $0.05 expensive?" It's "what does this replace?"

A sales ops analyst spending 20 minutes reviewing a pipeline costs the company roughly $15 in fully loaded labor. The agent does the same work in 30 seconds for $0.05. Even at 2,000 conversations per month, you're looking at $100/month in inference costs versus $30,000/month in human labor for the same throughput.

300x

cheaper than the human equivalent at 2,000 conversations/month

$100/mo agent cost vs. $30,000/mo fully loaded labor

The agent is 300x cheaper. And it works at 2am. It doesn't take PTO. It doesn't context-switch between Slack and the dashboard. It doesn't forget the edge case it learned about last month.

When it doesn't make sense

Agents are not free. Honesty about when they don't make sense matters more than hype about when they do.

Simple lookups. If a database query answers the question, use a database query. An agent that wraps a SELECT statement is an expensive SELECT statement.

One-off tasks. If you do something once a quarter, the setup cost of an agent won't pay back. Just do it manually.

Physical world interaction. Agents operate in digital systems. If the task requires someone to walk to a server rack, an agent can tell you which rack, but it can't walk there.

The sweet spot

The economics are compelling when: the task is repeated frequently, requires cross-system data, and benefits from natural language interaction.

The monthly cost model

Here's what real monthly inference spend looks like at different volumes.

Monthly inference cost by volume

500 conversations/mo$25 – $60

2,000 conversations/mo$100 – $240

10,000 conversations/mo$500 – $1,200

Plus platform fee (varies by plan). Ranges reflect mix of simple and complex conversations.

Important

These numbers assume context isolation via task agents. Without it, multiply by 5-10x.

The question isn't whether AI agents are affordable. It's whether you can afford to have humans doing work that an agent handles for pennies.

Amodal is an open-source agent runtime. Context isolation and task agent dispatch are built into the core.

GitHub →Docs →Build vs Buy AI →