
What It Actually Takes to Ship an AI Agent
An agent isn't a feature. It's an application stack.
Intro
Over the past year, AI agents, LLMs, tool calling, and MCP have become everyday vocabulary in product and engineering conversations. But they often get collapsed into the same mental bucket, used as shorthand for “AI capability,” even though they describe very different layers of the system.
All three are true. All three also describe different layers of the stack, and using them interchangeably hides how much work is left between a prototype and a production system.
The reason is structural. Many of these tools can produce similar demos: a chat interface, a response powered by an LLM, and sometimes a tool call or integration behind the scenes. From the outside, different platforms can look like they're doing roughly the same thing, even when the systems behind them are very different.
That distinction becomes critical when you move from experimenting to shipping. Saying “we have MCP” is like saying “we have REST APIs” when someone asks about your application architecture. It's true, but it answers a different question.
An agent isn't a feature. It's an application.
It's tempting to think of an agent as something you “add” to your product: a chat interface, a workflow assistant, an automation layer. In practice, an agent behaves much more like a full application stack that happens to use an LLM where a traditional app uses business logic. It depends on infrastructure, orchestration, state management, and permissions. The LLM is one component. The rest of the system is what makes it usable, reliable, and safe in production.
Here's what that stack actually looks like:
Each layer depends on the one below it. Skip a layer and the whole thing falls over. Maybe not in the demo, but definitely in production.
An AI agent is not a feature you bolt onto your product. It's a full application stack that happens to use an LLM where a traditional app uses business logic.
Breaking Down the Agent Stack
Once you can see the stack, the common shorthand starts to translate cleanly. Each of those answers maps to a single layer, usually one near the bottom.
None of these are wrong. They're just not sufficient. A programming language, an API protocol, a routing library, and a form input are all real things, but nobody would call that combination a SaaS application. The LLM is foundational, but it isn't a product any more than Python is a product. Everything above the LLM is the actual work, and it's the work most teams underestimate.
The demo trap
The reason teams underestimate it is simple: connecting an LLM to a tool can be done in an afternoon, and the demo is magic. Leadership sees it and greenlights production. Then the team spends six months discovering every layer they skipped: error handling, state management, permissions, audit logging, deployment, multi-tenancy. The demo was 5% of the work and 100% of the optimism.
These aren't edge cases. They're the core of making something reliable. What looked simple in a prototype turns into a much bigger system problem in production, and because the vocabulary is so new, teams often don't realize what they've signed up for until they're already mid-build.
You'd never build your own Stripe
In traditional SaaS, there are whole categories nobody builds from scratch anymore: authentication, payments, observability, infrastructure. Those debates ended years ago. In the agent space, the same pattern exists, but it's less established, so teams end up rebuilding foundational systems without quite realizing they're doing it.
Here's the same picture in SaaS terms:
You'd get laughed out of a board meeting for proposing to build your own payment processor. But “we'll build our own agent orchestration layer” still sounds reasonable to engineering teams, because the agent space is new enough that the equivalent infrastructure doesn't have a household name yet.
The underlying decision is the same one teams have always made: what's core to your product, and what's infrastructure?
Three paths, very different amounts of work
Once you frame it that way, the build-vs-buy options become much clearer. There are roughly three approaches, and they differ mostly in how much of the stack ends up on your team.
What you get vs. what you build
Direct API usage (“just use OpenAI”) gives you the foundation and leaves the other six layers for your team to build. Frameworks like LangChain offer scaffolding around orchestration and integrations, but governance, deployment, and multi-tenancy are still on you. That still leaves about four layers of work. An agent runtime handles the full system, leaving your team to focus on the one layer that's actually unique to your product: the skills and workflow logic.
The right path depends on whether agent infrastructure is your product or the means to delivering one.
Ship the agent, not the infrastructure
The teams that ship AI agents fastest aren't the ones with the best prompt engineers. They're the ones who recognized early that an agent is a full application, with all the infrastructure that implies, and made the build-vs-buy decision honestly.
That shift changes how you plan the work, how you design the system, and where your engineers spend their time. It also clarifies why something that felt simple in a demo turns out to be significantly more complex in production.
If you have an ML team, a year of runway, and agent infrastructure is your core product, building from scratch can be the right call. For everyone else, the goal is to ship something reliable without piling on complexity you don't need. That means focusing on what's actually unique to your product and being deliberate about what you build versus what you rely on.
The challenge isn't getting an agent to work. It's getting it to work consistently, safely, and at scale. That's what turns an interesting prototype into something users can depend on.
You wouldn't build your own Stripe. There's no reason to build your own agent runtime either.
Amodal is an open-source agent runtime. Configure AI agents with a git repo of markdown files. Your engineers write skills. The platform handles the rest of the stack.