Abstract interface layers dissolving into conversation
Insight

The Conversational Layer Is Coming for Every SaaS Product

Railway shipped an AI assistant that made their entire navigation layer optional. Every complex SaaS product is about to face the same question.

Amodal TeamMarch 23, 202612 min read

The thing that happened

I needed to set up a Postgres database, a Redis cache, a Docker container, and some environment variables on Railway. Their product has an AI assistant. I typed what I wanted and it configured everything. The database spun up, the environment variables populated, the container connected to both services. Maybe two minutes of typing.

Then I went to Google Analytics to add a team member. I spent ten minutes navigating through Admin, then Property Access Management, then figuring out which role mapping applied, then backing out because I was in the wrong property. Standard GA4 experience.

The contrast was jarring. Not because Railway's AI was doing anything exotic, but because it made me realize how much of my time in SaaS products is spent navigating, not doing. Finding the right settings page, remembering which submenu has the thing I need, clicking through confirmation dialogs. All of that is overhead. None of it is the task.

I am not writing this to praise Railway or bury Google Analytics. Both are fine products. I am writing this because the contrast revealed something structural about where software is headed. The value was not in the AI being smart. The value was in the AI making the product's existing capabilities accessible without requiring me to learn the product's navigation.

Railway's assistant is running Claude Sonnet 3.5. Not even the latest model. Some prompts, some API integrations. And it made their entire navigation layer optional. That's the part worth thinking about.

What actually changed

Railway didn't add a new feature. They changed the interaction model. The product is the same product with the same capabilities. The same databases, the same deployment options, the same networking configuration. But instead of navigating to Settings, then Databases, then New, then PostgreSQL, then Configure, you type “add a postgres database.”

The AI didn't make the product smarter. It made the product usable.

This matters because Railway has one of the more confusing UIs in the developer tools space. Nested modals inside nested modals. Workspaces that are usually just one block but still require you to navigate into them. Admin management on a different paradigm than project management. A mental model that makes sense once you've internalized it, but that takes weeks to internalize.

The AI didn't fix any of that. It routed around it entirely. You don't need to understand Railway's information architecture if you can just say what you want. The navigation layer, the thing that product teams spend months designing and refining and user-testing, became a fallback for edge cases.

The shift
The traditional SaaS interaction model is: learn the product's structure, then navigate to the right place, then perform your action. The conversational model is: state your intent. The product figures out the rest.

Think about what that means for the economics of product design. Companies spend enormous effort on information architecture, navigation patterns, onboarding flows, tooltip tours, help documentation. All of that effort exists because the user needs to find things. If the user can just ask, the entire findability problem evaporates.

The onboarding implications alone are significant. New users do not need to learn the product's structure before they can do useful work. They describe what they want and the product does it. The learning curve flattens from weeks to minutes. Time-to-value, the metric every SaaS company obsesses over, collapses. A user who would have churned at day three because they could not figure out the navigation stays because they never needed to figure out the navigation.

Support costs drop too. A huge percentage of SaaS support tickets are “how do I do X?” questions. The user knows what they want. They just cannot find where to do it. A conversational layer handles those questions before they become tickets. Not by providing documentation links, but by actually performing the action.

It doesn't evaporate completely, of course. There are tasks where a visual interface is genuinely better. Dashboards. Data visualization. Spatial arrangement of elements. Drag-and-drop workflows. Monitoring screens where spatial position carries meaning. The conversational layer does not replace those. But for the vast majority of SaaS interactions (configure this, change that, show me this report, add this user, connect this integration, update this setting), the graphical navigation layer is pure friction. The user knows what they want. The product makes them work to express it in clicks instead of words.

Where it broke

I want to be specific about this because the failure mode is more interesting than the success.

After setting up Postgres and Redis, the Docker container couldn't connect to the database. A standard connection issue. Could be a dozen things: network configuration, port binding, connection string format, service discovery, DNS resolution within Railway's internal network.

I asked the assistant to help troubleshoot. Its solution was straightforward and confident: delete the Postgres service and recreate it.

That's not troubleshooting. That's the equivalent of a tech support rep telling you to reinstall Windows. When I pushed back, the assistant suggested disabling the database entirely. Just remove the thing that wasn't working. No checking the network config. No examining port bindings. No verifying the connection string format. No escalation path. No “I'm not sure about this one, here are some things to check manually.”

The failure
When the happy path works, the assistant is magic. When something goes wrong, the assistant doesn't know how to reason about the problem. It knows API calls, not methodology. It can create and delete, but it can't diagnose.

This is the gap between a chatbot and an agent. A chatbot has a model and some API access. It can translate natural language into API calls, which is genuinely useful for happy-path operations. But when things break, you need something different. You need methodology. You need a system that knows: when Postgres can't connect, you check the network first, then the port binding, then the connection string, then DNS resolution. You don't delete the database.

The difference is structured reasoning about failure modes. A chatbot sees “Postgres won't connect” and pattern-matches to the most common resolution it's seen in training data. An agent sees “Postgres won't connect” and works through a diagnostic tree, checking each potential cause before suggesting a fix. The chatbot is fast and usually wrong when things are broken. The agent is methodical and usually right.

Railway's assistant is a chatbot with API access. That's not a criticism. It's a V1, and the V1 is impressive for what it does. But it illustrates a pattern that matters: the happy path is easy. The edge cases are where you need real engineering.

Every SaaS product has this problem

Railway's UI confusion is not unique. Every complex SaaS product has the same fundamental issue. The product is capable, but the UX is a maze. The features exist. Users just can't find them.

Salesforce. You can do almost anything in Salesforce. You can also spend twenty minutes trying to find where to change a field mapping. Setup menus inside setup menus. Object Manager vs. Setup Home vs. Lightning App Builder. Terminology that means different things in different contexts. A search function that returns hundreds of results when you type “field.”

Jira. Genuinely powerful project management underneath layers of configuration screens that make you question your career choices. Boards vs. Backlogs vs. Roadmaps vs. Plans vs. whatever they renamed things to this quarter. Scheme after scheme after scheme: workflow schemes, notification schemes, permission schemes, issue type schemes.

AWS Console. This one barely needs explanation. Hundreds of services, each with its own UI conventions. IAM policies that require a PhD to write correctly. CloudFormation templates that are technically YAML but spiritually hieroglyphics.

Google Analytics. The GA4 migration broke the mental models of millions of marketers. Events vs. conversions vs. key events. The explore interface that looks nothing like the reports interface. Property settings vs. data stream settings vs. account settings.

HubSpot. Forty different “settings” pages. Contact properties vs. deal properties vs. company properties, each with their own configuration screen. Workflows that live in a completely different section from the sequences they resemble. A CRM that is genuinely excellent once you know where everything is, and genuinely baffling until you do.

QuickBooks. Categorizing transactions should be simple. It is not. Chart of accounts that nobody outside accounting understands. Bank rules that almost work but not quite. Reports that require you to understand the difference between cash basis and accrual basis before you can ask a question. Every small business owner has the same experience: the bookkeeper knows how to use it, nobody else does.

Workday, ServiceNow, Zendesk, Datadog, Segment. The list goes on. These are products that took decades to build, that serve millions of users, and that most users navigate using the 20% of features they've memorized. The other 80% might as well not exist.

The 20% problem
Most users of complex SaaS products learn 20% of the features and navigate around the rest. Not because the other 80% isn't useful, but because finding and learning new features costs more cognitive effort than the feature is worth. A conversational layer makes 100% of the product accessible through natural language.

A conversational layer also unlocks features that product teams built but nobody uses. Every mature SaaS product has a graveyard of well-designed features that failed because users never found them. Power filters. Advanced reporting. Batch operations. Keyboard shortcuts. Custom automation rules. These features work perfectly. They sit behind three clicks and a submenu. Usage is 2% of the user base. A conversational layer surfaces them through intent: “filter my contacts by last activity date” activates the power filter that nobody knew existed. The feature adoption problem is a discoverability problem, and natural language solves discoverability.

This is happening now. Not in some hypothetical future. Railway shipped it. Notion has it. Linear has it. Vercel is building it. GitHub Copilot moved from code completion to a full conversational layer over the repository. The question is not whether your product gets a conversational layer. The question is who builds it and whether it actually works when things go wrong.

The build-it-yourself reality

The best-funded, most technically sophisticated companies will build their own. Railway has great engineers. They picked a model, wrote some prompts, connected their APIs. V1 shipped fast and the happy path is genuinely impressive. Credit where it's due.

But look at what the V1 is missing.

Model selection. They're on Sonnet 3.5. There are newer, better models available now, and there will be even better models next quarter. Sonnet 3.5 was a good choice when they built it. Staying on it when better options exist is leaving quality on the table. But switching models means re-testing every prompt, every API integration, every edge case. It is not a config change. It is a project. So companies stay on older models longer than they should, and their users get worse answers than they could.

Guardrails. The assistant suggested deleting a production database as a troubleshooting step. There is no system preventing that suggestion. No classification of destructive vs. non-destructive operations. No confirmation layer for actions that could cause data loss. No policy that says “never suggest deleting a resource as a first-line troubleshooting step.”

Learning loop. When I got the bad suggestion, that knowledge went nowhere. The next user who hits the same Postgres connection issue will get the same bad advice. There is no feedback mechanism. No way for the system to learn that “delete and recreate” is a terrible answer to connection problems.

Observability. Railway's VP of Product cannot see how often the assistant suggests destructive actions. Cannot filter conversations by failure mode. Cannot identify that 15% of Postgres-related queries result in “just delete it” suggestions. The AI layer is a black box to the team that owns the product.

Methodology. The assistant has prompts and API access. It does not have structured knowledge about how to troubleshoot Railway infrastructure. There is no encoded expertise that says: for connection issues, check networking first, then DNS, then credentials, then port bindings. The model is improvising based on its training data, which works great for common operations and fails unpredictably for anything unusual.

The compounding cost
These are not V1 problems that get fixed in V2. They are ongoing engineering commitments. Model updates, prompt engineering, guardrail tuning, edge case handling, observability dashboards, the learning flywheel. That is a team, not a weekend project. And it compounds: every new feature in the product needs to be reflected in the AI layer, tested, guardrailed, and monitored.

Every SaaS company that builds their own AI layer is signing up to maintain it forever. That's fine if you're Railway or Notion or Linear. You have strong engineering teams and AI is core to your competitive positioning. But most SaaS companies are not Railway.

Consider what happens when a new model drops. Anthropic releases Claude 4. OpenAI releases GPT-5. You want to use it because it's better, faster, cheaper. But your entire AI layer was built against the previous model's behavior. Prompts that worked perfectly now produce different outputs. Edge cases that were handled are suddenly not. The tone changes. The formatting changes. Capabilities shift in ways you did not expect.

So you test. You regression test every prompt, every integration point, every edge case. That takes a sprint. Maybe two. And while you're testing, your competitor who uses a proper AI infrastructure layer switched models in a config file and was live the same week.

Multiply this by every model release, every product feature addition, every new API endpoint you expose to the AI. The maintenance burden does not grow linearly. It compounds. By year two, the AI layer is one of the most expensive things you maintain, and it is not even your product. It is the interface to your product.

The math

There are roughly 30,000 B2B SaaS companies. The number varies depending on how you count, but it's in that range. Of those, maybe 200 have the engineering capacity to build and maintain what Railway built. Large teams, strong AI expertise, willingness to invest in non-core infrastructure.

That leaves 29,800 companies whose users are about to expect a conversational layer, because they've used Railway or Notion or Linear and now every other product feels like going back to the stone age.

Those 29,800 companies have three options.

Option 1: Build It

Hire an AI team. Pick a model. Write prompts. Build integrations. Maintain it forever. Most companies will try this, underestimate the ongoing cost, and ship something mediocre that erodes user trust.

Option 2: Ship a Chatbot

Wrap an LLM API, add some system prompts, call it an “AI assistant.” Works for simple queries. Fails catastrophically on anything complex. Users learn to ignore it within a month.

Option 3: Use Infrastructure

Adopt a purpose-built runtime that handles model management, guardrails, learning, observability, and methodology. Focus your engineering on domain-specific configuration, not AI plumbing.

Option 1 is expensive and distracting. You hired engineers to build your product, not to build AI infrastructure. Every sprint spent on prompt engineering and model testing is a sprint not spent on your actual product.

Option 2 is worse than nothing. A bad AI assistant teaches users that your AI layer is unreliable. They stop using it. Then when you ship the good version later, they do not try it because they already learned the lesson. First impressions in AI are extremely sticky. Users who have a bad experience with an AI feature are three to five times less likely to try it again after an update. You are better off shipping nothing than shipping something that suggests deleting production databases.

Option 3 is the only one that scales. The same way most companies do not build their own database, their own authentication system, or their own payment processing. They use infrastructure built by teams that specialize in it. The conversational layer is infrastructure. It should be treated as such.

What doing it right actually requires

The infrastructure layer for a reliable conversational AI is more substantial than most people realize. Here is what separates a chatbot from an agent that users actually trust.

Model selection and testing

Not just “pick Sonnet.” The ability to run the same prompts against multiple models, measure response quality, compare latency and cost, and switch models without rewriting integrations. Models improve every quarter. If changing models is a project, you will always be a quarter behind.

Methodology, not just prompts

Prompts tell the model what to say. Methodology tells the model how to think. When a user reports a Postgres connection error, a prompt might say “help the user fix the connection.” A methodology says: check the network configuration first, then verify the port binding, then validate the connection string format, then check DNS resolution. Never suggest deleting a resource as a troubleshooting step.

Methodology is domain expertise encoded as structured reasoning. It is what separates a junior developer who Googles the error message from a senior developer who works through the diagnostic tree. The model has the raw intelligence. Methodology gives it the experience.

# Connection Troubleshooting Methodology

## When: User reports a service cannot connect to a database

## Steps (in order):
1. Check network configuration
   - Are both services on the same internal network?
   - Are network policies blocking the connection?

2. Verify port binding
   - Is the database listening on the expected port?
   - Is there a port conflict?

3. Validate connection string
   - Correct host, port, database name, credentials?
   - URL-encoded special characters in password?

4. Check DNS resolution
   - Can the service resolve the database hostname?
   - Is the service using the internal hostname?

## Never:
- Suggest deleting a resource as a troubleshooting step
- Skip diagnostic steps to jump to "recreate it"
- Assume the database is the problem without checking the client

That is a skill. It is not code. It is not a complex system. It is domain expertise written down in a format the model can follow. Any senior engineer could write it. And once written, every user benefits from it on every interaction.

The key insight: methodology is composable. You write a connection troubleshooting skill, a deployment configuration skill, a resource scaling skill. The agent selects the right methodology based on the user's intent. When the user says “my app is slow,” the agent activates the performance diagnosis methodology, not the deployment methodology. When the user says “add a database,” it activates the provisioning methodology. The model's intelligence is directed by the right framework for the problem at hand.

Without methodology, the model is a generalist improvising in a specialist domain. It is like asking a smart person with no medical training to diagnose an illness. They might get lucky. They might also recommend something harmful. With methodology, the model becomes a specialist. It follows the same diagnostic process a senior engineer would follow, consistently, every time.

Guardrails

The agent cannot suggest deleting production resources as a troubleshooting step. The agent cannot execute write operations without explicit confirmation. The agent cannot perform bulk destructive actions. These constraints are not limitations. They are what makes the system trustworthy.

Without guardrails, you get Railway's “just delete Postgres” moment. The assistant is helpful 90% of the time and potentially catastrophic 10% of the time. That 10% is what determines whether users trust the AI layer or learn to ignore it. Users do not average out their experiences. They remember the worst one. One bad suggestion that causes data loss outweighs a hundred good suggestions that saved time.

Guardrails also protect the company. An AI assistant that suggests destructive actions without confirmation is a liability issue. When the AI says “I'll delete this database to fix the connection” and the user clicks yes because the AI seemed confident, who is responsible for the lost data? Guardrails are not just user protection. They are risk management.

The learning loop

When a user hits a bad suggestion, that feedback needs to flow back into the system. Not through model fine-tuning. Through knowledge base updates. The system learns that “delete and recreate” is a bad response to connection errors. The next user gets a better answer. Over fifty sessions, the system has learned the real patterns, the actual failure modes, the edge cases that training data doesn't cover.

Session 51 is materially better than session 1. Not because the model improved, but because the system accumulated domain knowledge. This is the flywheel that no amount of prompt engineering can replicate.

Compare this to Railway's approach. Every user starts from zero. The model has the same training data, the same prompts, the same blind spots. User 1,000 gets the exact same “just delete Postgres” suggestion that user 1 got. All that usage generated zero institutional learning. That is the difference between a static system and a compounding one. Static systems can only improve when engineers ship updates. Compounding systems improve from use.

Observability

Product teams need to see every suggestion the agent makes. Filter by failure mode. Identify patterns. See that 15% of connection-related queries result in destructive suggestions. Track whether guardrails are catching bad actions. Measure resolution rates by topic. Compare model performance across different query types.

Without observability, the AI layer is a black box. You shipped it, users are using it, and you have no idea whether it's helping or quietly eroding trust. That is not a tenable position for any serious product team.

Traditional product analytics do not solve this. You can measure that users are interacting with the AI. You cannot measure whether the interactions are good. A user who asks the AI to delete their Postgres shows up in your analytics as an “engaged user.” You need AI-specific observability: what was suggested, whether it was acted on, whether the action succeeded, whether the user had to undo it. This is a different analytics problem than anything SaaS companies have built before.

Permission boundaries

The agent can read everything. It can query any API, load any configuration, inspect any resource. But writes require confirmation. Destructive operations require explicit approval with a clear description of what will happen. Bulk operations require itemized confirmation. The agent is powerful for information retrieval and cautious for state changes.

This is not a technical limitation. It is a trust architecture. Users need to know that the conversational layer cannot do anything irreversible without their explicit approval. Once that trust is established, usage increases dramatically because the downside risk is bounded.

The trust equation
Users will rely on AI in proportion to how much they trust it. Guardrails, confirmation flows, and permission boundaries are not friction. They are what makes the system trustworthy enough to use for real work. Remove them and you get a demo that impresses for five minutes. Add them and you get a tool people use every day.

The window

The companies that already built their conversational layer are not the market. Railway, Notion, Linear will iterate on their own. They have the teams and the technical sophistication to do it. Good for them.

The market is the thousands of SaaS companies whose users are about to expect the same thing. Because once you've used Railway's assistant to spin up infrastructure in two minutes, going back to clicking through menus in any other product feels broken. User expectations ratchet up. They do not ratchet back down. This is the same thing that happened with search. Once Google made information findable in seconds, every website without decent search felt broken. The standard changed and it never changed back.

The companies that ship a conversational layer in the next 12 to 18 months will define the UX standard for their vertical. The ones that wait will play catch-up against competitors whose products simply feel easier to use. Not because the competitors have better features, but because the features are accessible through language instead of navigation.

This is the same dynamic that played out with mobile. The first companies in each vertical to ship a good mobile experience captured disproportionate market share. Not because mobile users were a different market, but because the experience delta was so large that users switched. The conversational layer is the same kind of experience delta.

Think about it from the user's perspective. You use Notion at work and it has a conversational AI that can find documents, create pages, summarize meeting notes, and update project statuses through natural language. Then you switch to your company's project management tool and you have to click through seven menus to update a status field. The second tool is not worse in any objective sense. It just feels worse. And “feels worse” is what drives switching decisions for SaaS products.

The enterprise buyers who evaluate software for their organizations are using these AI layers in their personal tools. They are forming expectations. When they evaluate your product and it does not have a conversational layer, they will not articulate it as a missing feature. They will just say it “feels dated” or “the UX is clunky.” The conversational layer is becoming a baseline expectation the same way responsive design became a baseline expectation in 2014. Nobody asked for it by name. Products that did not have it just felt wrong.

And like mobile, the gap between “we have it” and “it's good” is enormous. A bad conversational layer is worse than none at all. It teaches users that the AI button is unreliable, and they stop clicking it. You get one shot at first impressions. Shipping a chatbot that suggests deleting production databases is not the first impression you want.

There is a particular trap for companies that wait too long and then rush. They see a competitor ship a conversational layer, panic, and throw together a chatbot in six weeks. The chatbot handles simple queries, fails on anything complex, and generates a wave of negative user sentiment that takes months to recover from. The competitor, meanwhile, has been iterating for six months and their AI layer is getting better every week because it has a learning loop. The gap widens, not narrows.

Speed matters, but speed without quality is destructive. This is why the infrastructure question is so important. The companies that can ship a good conversational layer quickly will be the ones who did not build it from scratch. They used infrastructure that gives them guardrails, methodology, and observability on day one, and let them focus their engineering time on domain-specific configuration instead of AI plumbing.

What we're building

This is what Amodal is for.

We build the infrastructure layer that makes reliable conversational AI possible for the other 29,800 companies. Skills that encode domain methodology. Guardrails that prevent destructive suggestions. A learning loop that gets smarter with every session. Observability that gives product teams visibility into every interaction. Model management that lets you switch models without rewriting your integration. Permission boundaries that make the system trustworthy enough for production use.

The stuff that turns a chatbot into an agent. The stuff that makes the difference between a demo and a product.

Railway showed that the conversational layer works. The question for every other SaaS company is not whether to add one, but how to build it right. That is the question we spend every day on.

Amodal is an open-source agent runtime. Skills, guardrails, model management, observability, and the learning flywheel. The infrastructure that turns a chatbot into a reliable agent.