Claude Architecture Explained: 5 Layers and 2 Cross-Cutting Planes

June 23, 2026 · How Claude Actually Works (part 31)

▶ Watch on YouTube & subscribe to The Stack Underflow

This final episode of “How Claude Actually Works” stitches together every concept from the series into a single cohesive mental model — five stacked layers and two cross-cutting planes. If you have been watching episodically, this is the map you pin to the wall. If you are brand new, this is the orientation that makes every other episode click into place.

The key insight is that these layers form a stack, not a menu. You cannot pick and choose which ones apply; each one depends on the one beneath it. A new Claude model ships? It drops into layer zero and the rest of the stack carries on unchanged. A new tool type arrives? Layer two handles it. That composability is the whole payoff.

The one-sentence version: The Claude stack is five layers (model → protocol → reach → orchestration → surfaces) plus two planes (prompts and structured output) that cross-cut every layer — and understanding that structure means you always know where a new Claude feature belongs.

The Five Layers

Layer 0 — The Model

Tokens in, tokens out. Nothing else. This is the absolute ground floor. Every other layer in the stack rests on this one. A new model release — different weights, different context window, different pricing — lands here and only here. The layers above do not care which specific model is running; they only care that something takes tokens and returns tokens.

Layer 1 — The Protocol

This is messages.create and stop_reason. It is the API surface that lets you actually communicate with the model in a structured way. The protocol gives you the vocabulary: user messages, assistant turns, tool calls, tool results. Without the model below it, the protocol has nothing to wrap. Without the protocol, you are typing directly at raw weights — which you cannot do.

POST /v1/messages
{
  "model": "claude-...",
  "messages": [{ "role": "user", "content": "..." }],
  "tools": [...]
}

The stop_reason field — end_turn, tool_use, max_tokens — is how you learn what the model wants to do next. This is the hinge that makes agentic loops possible.

Layer 2 — Reach (Tools and MCP)

Reach is how the model touches the outside world. It encompasses tool definitions, tool calls, tool results, and the Model Context Protocol (MCP). Without the protocol layer below, there is no way to even describe tools to the model — they arrive in the tools array of the API request. A new tool type (a new MCP server, a new API integration) lands here. Same plug, same socket.

User prompt


[Layer 1: Protocol] ──► model sees tool definitions


[Layer 2: Reach] ──► model calls a tool ──► result comes back


[Layer 1: Protocol] ──► next turn begins

Layer 3 — Orchestration

Orchestration is what turns a single API call into an agent. It is the agentic loop: check stop_reason, execute the requested tool, feed the result back, repeat until the model signals it is done. Orchestration also covers sub-agents (spawning one Claude instance to call another), memory (injecting prior context), and any glue code that manages multi-step reasoning. Without reach — without tools — the loop has no actions to choose between. It is just a text box that you keep submitting.

Orchestration conceptWhat it adds
Agentic loopTurns single calls into multi-step workflows
Sub-agentsLets one Claude delegate to another
MemoryPersists context across turns or sessions

Layer 4 — Surfaces

Surfaces are the shapes users actually see: Claude Code (the CLI agent), the Agent SDK for building managed agents programmatically, and whatever Anthropic ships next. A surface is an opinionated wrapper around orchestration. It makes choices about UI, interruption points, and output formatting so you do not have to. New surfaces land here; the layers below remain untouched.

The Two Planes

Behind all five layers sit two cross-cutting planes. They are not layers themselves — they do not have a specific position in the stack. Instead, they influence every layer simultaneously.

Plane 1 — Prompts. System prompts, user turn formatting, few-shot examples, context stuffing. The way you frame language shapes model behavior at layer zero and affects every layer above it. Covered in depth in episode 4.

Plane 2 — Structured output, context, reliability, and cost. JSON mode, token budgets, caching strategy, error handling, retry logic. These are cross-cutting concerns that matter equally whether you are at the protocol layer writing a one-shot call or at the orchestration layer managing a 50-step agent. Covered in episode 6.

The Stack as a Mental Model

The reason this framing is useful is that it absorbs change gracefully:

  • A new model ships → Layer 0. The stack absorbs it. Nothing above changes.
  • A new tool type or MCP server ships → Layer 2. Same plug, same socket.
  • A new surface ships (Claude for desktop? A new IDE integration?) → Layer 4. The layers below still work identically.

When you read a Claude release note and ask “where does this go?”, the stack gives you the answer immediately. That is the payoff: five layers, two planes, one mental model you carry into every Claude project.

Episode Map

Each episode in the course corresponds to a layer or a plane:

Episode(s)Coverage
101–102Layer 0 — The Model
103Layer 1 — The Protocol
201–202Layer 2 — Reach (tools and MCP)
3xxLayer 3 — Orchestration
5xxLayer 4 — Surfaces
4xxPlane: Prompts
6xxPlane: Context, reliability, and cost

Common Misconceptions

  • “The layers are independent — I can skip the ones I don’t need.” No. Each layer depends on the one below it. You cannot use orchestration without reach, reach without the protocol, or the protocol without a model. It is a stack for a reason.
  • “MCP replaces the protocol layer.” MCP lives at the reach layer (layer 2). It is a standardisation of how tools are described and called, not a replacement for the underlying messages.create API.
  • “Surfaces like Claude Code are completely separate products.” They are layer-4 wrappers around the same underlying stack. Understanding the lower layers helps you understand and extend these tools, even when you are just a user of them.
  • “Prompts are a layer 1 concern.” Prompts are a cross-cutting plane. A poorly written system prompt degrades model behavior at layer zero and cascades upward through every layer above it.

Frequently Asked Questions

Where does RAG (retrieval-augmented generation) fit in the stack? RAG is primarily a reach concern (layer 2) — you are using a tool or MCP server to retrieve documents — but it also touches the prompts plane because the retrieved chunks have to be injected into context in a way that keeps cost and reliability in check.

If I am building with the Agent SDK, which layers am I working at? The Agent SDK is a surface (layer 4) that manages orchestration (layer 3) for you. You are still responsible for reach configuration (layer 2) — which tools and MCP servers to register — and the prompts plane (how you write system prompts for each agent).

Does a multi-agent system change the stack structure? No, it multiplies it. Each sub-agent is its own stack instance. Orchestration at layer 3 coordinates between stacks, but each individual Claude call still follows the same model → protocol → reach → orchestration path.

When Anthropic ships a new Claude model, do I need to change my orchestration code? Typically no. Layer 0 absorbs the new model; the protocol, reach, and orchestration layers are model-agnostic. You may want to update your prompts plane to take advantage of new capabilities, and you should check for any context-window or pricing differences in the cost plane — but structurally the stack holds.

Where This Fits in the Series

This recap is the series finale of How Claude Actually Works. Every prior episode explored one layer or one plane in isolation; this episode assembles the complete picture. If you are reading this before watching the rest, use the episode map above to dive into whichever layer matters most to your current project. If you have watched every episode, this page is the cheat-sheet you bookmark. Browse all tutorials to see what comes next.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →