Insight

What It Actually Takes to Ship an AI Agent

An agent isn't a feature. It's an application stack. Here's the stack, and what each "shortcut" actually covers.

Intro

Over the past year, AI agents, LLMs, tool calling, and MCP have become everyday vocabulary in product and engineering conversations. But they often get collapsed into the same mental bucket, used as shorthand for “AI capability,” even though they describe very different layers of the system.

All three are true. All three also describe different layers of the stack, and using them interchangeably hides how much work is left between a prototype and a production system.

The reason is structural. Many of these tools can produce similar demos: a chat interface, a response powered by an LLM, and sometimes a tool call or integration behind the scenes. From the outside, different platforms can look like they're doing roughly the same thing, even when the systems behind them are very different.

That distinction becomes critical when you move from experimenting to shipping. Saying “we have MCP” is like saying “we have REST APIs” when someone asks about your application architecture. It's true, but it answers a different question.

An agent isn't a feature. It's an application.

It's tempting to think of an agent as something you “add” to your product: a chat interface, a workflow assistant, an automation layer. In practice, an agent behaves much more like a full application stack that happens to use an LLM where a traditional app uses business logic. It depends on infrastructure, orchestration, state management, and permissions. The LLM is one component. The rest of the system is what makes it usable, reliable, and safe in production.

Here's what that stack actually looks like, from top to bottom:

Deployment & Multi-tenancy — hosting, CI/CD, tenant isolation
Governance & Observability — audit trails, permissions, monitoring
Your Product UI — where your users interact with the agent
Skills & Workflow Logic — what the agent knows how to do
Connections & Integrations — talking to your systems (APIs, DBs, SaaS)
Agent Runtime — orchestration loop, state, memory, reasoning
LLM — OpenAI, Anthropic, etc.

Each layer depends on the one below it. Skip a layer and the whole thing falls over. Maybe not in the demo, but definitely in production.

An AI agent is not a feature you bolt onto your product. It's a full application stack that happens to use an LLM where a traditional app uses business logic.

Breaking down the agent stack

Once you can see the stack, the common shorthand starts to translate cleanly. Each of those answers maps to a single layer, usually one near the bottom.

“We're using OpenAI” — access to an LLM. SaaS equivalent: “we're using Python.”
“We have MCP” — ability to call tools. SaaS equivalent: “we're using REST APIs.”
“Built with LangChain” — a tool-calling loop in a script. SaaS equivalent: “we wrote an Express.js server.”
“Added a chat widget” — user input layer. SaaS equivalent: “we added a contact form.”

None of these are wrong. They're just not sufficient. A programming language, an API protocol, a routing library, and a form input are all real things, but nobody would call that combination a SaaS application. The LLM is foundational, but it isn't a product any more than Python is a product. Everything above the LLM is the actual work, and it's the work most teams underestimate.

The demo trap

The reason teams underestimate it is simple: connecting an LLM to a tool can be done in an afternoon, and the demo is magic. Leadership sees it and greenlights production. Then the team spends six months discovering every layer they skipped: error handling, state management, permissions, audit logging, deployment, multi-tenancy. The demo was 5% of the work and 100% of the optimism.

You'd never build your own Stripe

In traditional SaaS, there are whole categories nobody builds from scratch anymore: authentication, payments, observability, infrastructure. Those debates ended years ago. In the agent space, the same pattern exists, but it's less established, so teams end up rebuilding foundational systems without quite realizing they're doing it.

Here's the same picture in SaaS terms:

Auth (Clerk, Auth0) → Permissions & guardrails
Payments (Stripe) → LLM orchestration & token management
Observability (Datadog) → Audit trails & session logging
Database (Postgres) → State & memory
Deployment (Vercel, AWS) → Multi-tenancy & hosting
Framework (Django, Rails) → Agent runtime

You'd get laughed out of a board meeting for proposing to build your own payment processor. But “we'll build our own agent orchestration layer” still sounds reasonable to engineering teams, because the agent space is new enough that the equivalent infrastructure doesn't have a household name yet.

Three paths, very different amounts of work

Once you frame it that way, the build-vs-buy options become much clearer. There are roughly three approaches, and they differ mostly in how much of the stack ends up on your team.

Direct API usage (“just use OpenAI”) gives you the foundation and leaves the other six layers for your team to build. Frameworks like LangChain offer scaffolding around orchestration and integrations, but governance, deployment, and multi-tenancy are still on you. An agent runtime handles the full system, leaving your team to focus on the one layer that's actually unique to your product: the skills and workflow logic.

Ship the agent, not the infrastructure

The teams that ship AI agents fastest aren't the ones with the best prompt engineers. They're the ones who recognized early that an agent is a full application, with all the infrastructure that implies, and made the build-vs-buy decision honestly.

The challenge isn't getting an agent to work. It's getting it to work consistently, safely, and at scale. That's what turns an interesting prototype into something users can depend on.

You wouldn't build your own Stripe. There's no reason to build your own agent runtime either.