How MCP Apps Work: Tools That Return Interactive UI

June 23, 2026 · AI Agent Internals: How Coding Agents Really Work (part 5)

▶ Watch on YouTube & subscribe to The Stack Underflow

Most people think a Model Context Protocol (MCP) tool can only return text back to the model. You call a tool, it hands back a string, the model reads it. That’s the mental model nearly everyone starts with — and it’s wrong, or at least badly out of date.

Modern MCP tools can return interactive UI: a form, a chart, a dashboard, a map, a multi-step widget — rendered right inside the chat, that the user can click and type into. This is the capability now standardized as MCP Apps. This tutorial breaks down exactly how it works, why it’s designed the way it is, and shows you the smallest example that actually does something.

The one-sentence version: an MCP tool returns a pointer to a UI resource; the host renders that resource in a sandboxed iframe; and the iframe talks back to the model over a JSON-RPC channel — so the model stays in the loop while the user interacts with a real interface.

The problem MCP Apps solve

Text is a terrible interface for some jobs.

If a tool returns “Here are 40 rows of data,” the model has to serialize a table into prose, you read prose, and any follow-up (“sort by date”) is another full round-trip through the model. Now imagine the tool could return an actual sortable table. You click a column header. Done. No tokens spent, no round-trip.

That’s the gap. Some things text does well (explanation, reasoning). Some things a UI does well (live updates, direct manipulation, media, persistent state). MCP Apps let a single tool use both — the model reasons in text, the UI handles what text can’t.

How it actually works (the four moving parts)

There are four pieces, and once you see them the whole thing clicks:

The tool declares a UI resource. Instead of (or alongside) returning text, the tool result carries a _meta.ui.resourceUri field pointing at a UI resource the server exposes — typically an HTML document.
The host fetches and sandboxes it. The MCP host (Claude, ChatGPT, VS Code, Goose, etc.) reads that URI, fetches the resource from the server, and renders it inside a sandboxed iframe. Sandboxing matters: the server’s UI cannot touch the host app’s DOM, cookies, or other tools. It’s a guest in a locked room.
The UI and host talk over postMessage. The iframe communicates with the host using JSON-RPC over window.postMessage — a structured request/response channel, not freeform messages. The UI can call back into the host, and the host can push updates into the UI.
The model stays in the loop. Crucially, when the user interacts with the UI, those actions can be surfaced back to the model. The model sees what happened and can respond. The UI isn’t a dead end that bypasses the AI — it’s an input surface for the AI.

Here’s the shape of the round-trip:

User asks → model calls tool → tool returns _meta.ui.resourceUri
        → host fetches UI resource → renders in sandboxed iframe
        → user clicks/types in the UI
        → UI sends JSON-RPC message over postMessage to host
        → host relays the action to the model → model responds

A minimal example

Let’s build the smallest MCP App that does something real: a tool that returns a UI with a button, and clicking the button sends a value back to the model.

Note: the MCP Apps spec and SDK are evolving quickly. The shapes below show the pattern; check the official MCP Apps repo for the exact current API before you ship. The companion video walks through a live version.

1. The server declares a tool that returns a UI resource. Conceptually, the tool result references a UI resource instead of returning plain text:

// Tool result that points at a UI resource rather than returning text.
return {
  content: [{ type: "text", text: "Pick a priority:" }],
  _meta: {
    "ui": {
      resourceUri: "ui://priority-picker"   // the host will fetch and render this
    }
  }
};

2. The server exposes the UI resource — an HTML document the host will load into the sandboxed iframe:

<!-- ui://priority-picker -->
<button id="high">High</button>
<button id="low">Low</button>
<script type="module">
  import { App } from "@modelcontextprotocol/ext-apps";
  const app = new App();               // sets up the JSON-RPC/postMessage channel

  for (const id of ["high", "low"]) {
    document.getElementById(id).onclick = () => {
      // Send the user's choice back to the host -> model.
      app.notify("priority-selected", { priority: id });
    };
  }
</script>

3. The host renders it, the user clicks, the model hears about it. When the user clicks “High,” the app.notify call travels over postMessage to the host, which relays it to the model — which can now say “Got it, marking this high priority” and call the next tool. No retyping, no parsing prose.

That’s the entire loop. Everything else — dashboards, forms, live-updating charts — is just a richer version of this same four-part dance.

Why the sandbox is non-negotiable

A natural question: why not let the UI run directly in the host? Because the UI comes from a third-party MCP server you may not fully trust. If that HTML could read the host’s DOM, it could scrape your conversation, steal tokens, or impersonate the host UI. The sandboxed iframe + structured JSON-RPC channel means the UI can only do what the protocol explicitly allows — request specific things, send specific messages — and nothing more. Capability, not access.

Common misconceptions

“MCP tools only return text.” Out of date. They can return UI resources via _meta.ui.resourceUri.
“The UI replaces the model.” No — the model stays in the loop and sees user actions. The UI is an input surface, not a bypass.
“It’s a security hole to render server HTML.” Only if you skip the sandbox. The spec mandates the sandboxed iframe + structured channel precisely to contain it.
“This is the same as a normal web embed.” No — the bidirectional JSON-RPC channel to the model is the whole point. A plain embed can’t feed actions back to the AI.

Frequently asked questions

Which clients support MCP Apps today? Clients including Claude, ChatGPT, Goose, and VS Code have shipped support, with more coming. Always check the current support matrix before depending on it.

Do I need a special SDK? The @modelcontextprotocol/ext-apps package provides an App class that handles the host/UI communication so you don’t hand-roll postMessage plumbing.

Is this the same as MCP-UI? MCP-UI and the OpenAI Apps SDK pioneered these patterns; MCP Apps is the official extension that standardizes them. If you learned MCP-UI, the concepts carry over.

Where’s the spec? The official spec and SDK live in the modelcontextprotocol/ext-apps repository.

Where this fits in the series

This is part of How MCP Actually Works — a series that takes MCP apart piece by piece. If tools-that-return-UI is the first thing you’ve read here, the earlier parts cover what MCP is, how tools and resources are defined, and how a host talks to a server. Use the navigation below to go in order, or browse all tutorials.

Sources & further reading: MCP Apps announcement · MCP-UI project · ext-apps spec & SDK

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →