Steel framework of a building under construction at night
Strategy

The First AI Feature You Should Ship (And Why It's Probably Not a Chatbot)

Most teams reach for a chatbot when adding AI. Here's why your first feature should solve a smaller, sharper problem, and five patterns that consistently win.

Amodal TeamMay 21, 20269 min read

Most product teams who get told to “add AI” start in the same place: they want to ship a chatbot. Maybe they call it a copilot, an assistant, or an agent. It's what investors ask about, it's what the CEO saw a competitor demo last quarter, and it's the version of AI everyone has seen. Of course you want one.

It's also the worst possible first AI feature in B2B SaaS.

The teams that ship something real in their first quarter almost always start with something a little embarrassing in its smallness. A form field that fills itself. A summary at the top of a long page. A search box that finally understands what the user meant. Narrow, boring, and in production beats ambitious and still in scoping every time.

This post is about why the chatbot instinct burns six months, what to ship instead, and how to do it in roughly six weeks.

The chatbot trap

I don't blame anyone for the chatbot pull. You've seen the demo, the board has asked about it twice, and sales has a deck that mentions it. The problem isn't the idea, it's what happens when you try to scope it.

Chat is the widest possible surface area. The input box invites every question, which means your team has to plan for every question. You'll spend the first month writing a one-pager, the second month arguing about which use cases are in scope, and the third month negotiating with security. By month four you're still pre-engineering, and the team that scoped “smarter ticket routing” instead is on their second iteration.

Then there's eval. Nobody enjoys talking about it, but it matters more for a chatbot than for anything else, because every output is different and “is this answer correct?” is hard to grade when the question itself could be anything. Most chatbots ship without a real eval set, the team finds out it's wrong from support tickets, and someone is patching prompts at midnight by week three.

And the metric is fuzzy. What does success even mean? “Engagement” is meaningless on its own, “questions answered” can be gamed by counting any reply, and “customer satisfaction” takes months to show up in the data. Without a clean number you can point at in Monday's standup, the feature drifts.

To be clear
I'm not saying never ship a chatbot. Plenty of products eventually do, and some do it well. But almost all of the ones that get there started somewhere smaller. They built the eval muscle on a narrow feature, learned what their customers actually use AI for, and then earned the right to scope something more open. If you're being pushed toward chat as the first thing, push back.

What makes a good first AI feature

There are four things to look for, and none of them are exciting. That's the point.

Frequency. The feature has to live somewhere your users go every day. If they hit it once a month, you'll get one signal a month, and you'll be guessing for a year. Daily use means you find out in a week whether the thing works.

Narrowness. One job, one screen. The AI does a specific thing, not a general thing. The more specific the job, the easier the eval, the cleaner the metric, and the faster the iteration loop. “Suggest a response” is a feature; “be helpful” is a quarter of meetings.

A clean success metric. Before you scope anything, write down the number you'll use to decide if it worked, in one sentence. Tickets routed to the wrong queue drop below five percent. Response acceptance rate stays above sixty percent. Time-to-first-draft cut in half. If you can't write that sentence, the scoping isn't done.

Reversibility. Your first AI feature should be easy to turn off. If it breaks, you flip a flag and the product still works the way it did yesterday. In practice that means the feature is an enhancement to an existing workflow, not a replacement for one. Defaults you can override, suggestions you can ignore, summaries that sit alongside the source instead of in front of it.

The filter
Hit all four and you've got something worth building. Miss one and the feature will eat more of the team than you think.

Five patterns that consistently work

When teams stick to those four criteria, the feature usually lands in one of five shapes. None of them are new, and all of them ship.

1

Smart defaults

Pre-fill a field the user was about to fill in. They accept it with a click or override it with a few keystrokes. When you're wrong, nobody notices, because it just looks like a field someone typed into. When you're right, you save a real minute every time. Subject lines from email bodies, customer fields from a domain, tags from content.
2

In-context summarization

A short summary at the top of a long page. This works almost anywhere your users skim long content every day: support threads, customer notes, contracts, meeting transcripts, audit logs. The metric is simple. Do users still click in to read the full thing, or did the summary answer their question?
3

Better search

Replace keyword search with semantic search across the customer's own data. Users type the question they actually have, not the keyword they think will match. Almost every B2B SaaS has a search box that nobody loves, and this is the pattern that fixes it. Track search abandonment and click-through on the first three results.
4

Inline autocomplete

Suggest text in a field the user is already typing in. It's a familiar interaction, it's one keystroke away from being ignored, and the metric is clean: acceptance rate. The best fit is anywhere your users write the same kind of thing over and over, like response templates, status updates, descriptions, or internal docs.
5

Classification and routing

Tag, sort, or assign incoming items automatically. Tickets routed to the right team. Documents filed in the right folder. Leads scored on fit. The user sees the result, not the prediction, and the metric is operational, which usually makes the business case easier to write.

There are other patterns that work, but these five are the ones that consistently ship in six to eight weeks with a small team without leaving a mess behind. Pick one, and don't try to do two at once even though you'll be tempted.

A six-week plan

The work splits cleanly into three two-week stretches. You can run it with three people: a PM, a full-stack engineer, and a designer. You don't need an ML engineer, because none of the five patterns above require model research, just good product judgment.

Weeks one and two. Pick the pattern, define the job, and write the eval set. The eval set is the part most teams skip and then regret. Twenty to fifty real examples, pulled from real customer data with permission, labeled with what the correct answer would have been. This isn't a research artifact, it's how you'll know later whether a prompt change made things better or worse. Skip it and you're flying blind for the next month.

Weeks three and four. Build it, and ship it to internal users only. Use it every day. Notice what's annoying and fix what's wrong. Don't worry about edge cases yet; focus on the obvious failures. By the end of week four, your eval score should be steady at a number you'd be willing to put in front of a real customer.

Weeks five and six. Roll to five or ten percent of customers and instrument everything: adoption, the success metric, cost per request, latency. Let it run for a week. If the metric holds, expand. If it doesn't, decide whether to iterate or kill. Don't slow-roll it forever. A feature stuck at ten percent for two months is dead, you just haven't admitted it yet.

Some teams will move faster, and some will need longer because of compliance, integration, or messy data. But six weeks is a good baseline. If your plan is twelve, you scoped too much. If it's two, you skipped the eval set.

Fence it off
The plan only works if you fence the team off from new asks during those six weeks. Treat it as a small, time-boxed bet, not a strategic initiative.

It's worth saying out loud that roughly half of this six-week plan is product work you have to do either way: which pattern fits, what the success metric is, which surface to put it on, what a good answer actually looks like. The other half, like eval tooling, prompt versioning, cost instrumentation, and per-tenant guardrails, is infrastructure work that platforms exist to abstract. If you're building it all yourself, plan for the full six weeks. If you're using a platform, plan for the decisions and skip most of the rest.

Three ways this goes wrong

There are three failure modes I see over and over.

Picking the demo, not the workflow. The feature that wins the sales meeting often loses with real users because they hit it once a quarter, not every day. “Will it impress the customer” and “will the customer use it” are two very different questions, and only the second one matters once you've shipped.

Skipping eval to “move fast.” You will not move fast. You'll spend the second month redoing the first month, because you can't tell which of your prompt changes actually helped.

Over-polishing UI before the feature works.Beautiful UI on something broken is more confusing than ugly UI on something useful. Ship rough, iterate on the substance, and polish last when you know it's worth the polish.

Avoid those three and the six-week thing usually works.

Earn the right to ship the second one

The first AI feature you ship isn't the one that wins the market, it's the one that earns the right to ship the second one. Make it small, make it daily, and make the success metric something you can write in a single sentence. Then ship it.

The bigger surfaces, the chat, the agents, the copilots, can come later, after your team understands how your customers actually use AI and you've built the muscle to scope something open-ended without it eating two quarters. The teams that try to start there usually never get there. The teams that start small almost always do.

If you're scoping your first AI feature this week, the four criteria above are a decent filter and the five patterns are a reasonable starting menu. Pick one and start.

Six weeks is a good baseline if you're building it yourself. Book a demo and do it even faster with Amodal — the eval tooling, prompt versioning, cost instrumentation, and per-tenant guardrails come built in, so your team spends the time on the decisions that matter.