How Claude Works: A 5-Layer Mental Model for Developers
▶ Watch on YouTube & subscribe to The Stack Underflow
If you’ve ever felt like Anthropic’s documentation is a pile of loosely connected concepts — prompts here, MCP there, agents somewhere else — you’re not wrong. What’s missing is a single organizing structure that shows how everything fits together. That’s what the Claude Stack gives you.
The Claude Stack is a five-layer mental model, deliberately analogous to the five-layer internet model (physical / link / internet / transport / application). Each layer hides the complexity below it, and each one only makes sense once you understand the layer underneath. This is a teaching analogy, not a network protocol — the layering principle carries over, not the OSI semantics.
The one-sentence version: Claude is just tokens in and tokens out — but five layers of infrastructure (model, API call, tool protocol, agent loop, and surface packaging) transform that into the AI applications you actually build and use.
The Five Layers at a Glance
Here’s the full stack, top to bottom, with the internet model equivalent for orientation:
| Layer | Name | Internet Analogy | What It Does |
|---|---|---|---|
| L4 | Surfaces | Application | Claude Code, Agent SDK, Managed Agents |
| L3 | Orchestration | Transport | The agent loop — runs until the task is done |
| L2 | Reach | Internet | Tools and MCP — the model touching the world |
| L1 | Protocol | Link | The Messages API — one call, one stop reason |
| L0 | The Model | Physical | Tokens in, tokens out |
Two cross-cutting planes slice through every layer:
- Prompts — shape how the model interprets everything above L0
- Cost & reliability — tokens, caching, tool latency, loop length, deployment surface
L0 — The Model (Tokens In, Tokens Out)
At the bottom sits the model itself. Three tiers:
- Haiku — fast and cheap; good for high-volume, low-complexity tasks
- Sonnet — the balanced default; where most production workloads live
- Opus — the heavyweight; slowest and most capable
The critical thing to internalize about L0: the model just turns tokens into tokens. That is all it does. It cannot read your database, call an API, or remember last Tuesday’s conversation. If you expect it to do any of those things without extra infrastructure, you’ll be confused. All the magic happens in the layers above.
L1 — The Messages API (One Call, One Stop Reason)
You never talk to the model directly. You make a single HTTP POST to the Messages API, and the API returns a response that includes a stop_reason field. That field is everything.
{
"stop_reason": "tool_use",
"content": [
{
"type": "tool_use",
"name": "read_file",
"input": { "path": "/tmp/data.csv" }
}
]
}
stop_reason can be one of four values:
| Value | Meaning |
|---|---|
end_turn | Model is done. Stop. |
tool_use | Model wants a tool run. Call it and come back. |
max_tokens | Hit the token limit. Handle it. |
stop_sequence | A custom stop string matched. |
Hold onto stop_reason. Half of L3 is built on it.
L2 — Reach: Tools and MCP
A model at L0 can only describe what it wants. It cannot act. stop_reason: "tool_use" is the model saying “I want to call this function with these arguments” — but it’s still just text.
Tools are the contract that turns that description into a real action. You define available tools in your API request; the model picks one when it needs external information or side effects; you execute it and return the result.
MCP (Model Context Protocol) standardizes this contract so that tools can be defined and served independently of your application code — servers you install, community-built integrations, your own internal APIs. MCP is L2: the model’s reach into the world.
Model describes action (L0 text)
↓
Messages API returns tool_use stop_reason (L1)
↓
Your code (or MCP server) executes the tool (L2)
↓
Result goes back into the next API call (L1 again)
L2 only makes sense once you’ve accepted L0. The model produces text. MCP is the bridge from text to action.
L3 — Orchestration: The Agent Loop
An agent is not magic. An agent is a while loop.
messages = [{"role": "user", "content": user_input}]
while True:
response = client.messages.create(model=model, tools=tools, messages=messages)
if response.stop_reason == "end_turn":
break # Done
elif response.stop_reason == "tool_use":
tool_result = run_tool(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_result})
elif response.stop_reason == "max_tokens":
handle_limit() # Your problem now
That’s L3. The loop reads stop_reason from L1, dispatches tools from L2, and keeps calling the model until the task is complete. Additional concerns at L3 include:
- Hooks — code that fires at lifecycle points (before/after tool calls, on errors)
- Sub-agents — agents that spin up their own context windows to handle subtasks
- Memory — mechanisms to persist information across calls or sessions
You cannot understand agents without stop_reason from L1 and tools from L2. L3 is the composition of the two layers below it.
L4 — Surfaces: The Things You Actually Install
At the top sit the packaged products — all four lower layers dressed for a specific use case:
- Claude Code — a terminal agent; the agentic coding assistant you run from your shell
- Agent SDK (formerly the Claude Code SDK) — the library version; build your own surfaces
- Managed Agents — Anthropic-hosted agents; someone else runs the loop for you
Every surface is just L0 through L3 with opinions about defaults, packaging, and deployment. When a new Anthropic product ships, the question to ask is: “which layer does this live at?” That question now has a concrete answer.
Common Misconceptions
-
“
stop_reasonis a status code like HTTP 200/404.” No. It’s a dispatch signal that tells the agent loop what to do next. It has four possible values and each one implies a different code path — it’s closer to an enum in a state machine than an HTTP status. -
“The model calls tools.” The model describes a tool call in text. Your code — or an MCP server — actually runs it. The model never executes anything directly.
-
“Agents are a special API.” An agent is your loop + the Messages API + tools. There is no separate “agent endpoint.” L3 is code you write (or that a surface like Claude Code writes for you).
-
“Haiku, Sonnet, and Opus are interchangeable with different price tags.” They have different capability ceilings, latency profiles, and context window behaviors. Picking the wrong tier is a common source of both cost overruns and quality failures.
Frequently Asked Questions
What’s the difference between MCP and regular tool calling? Regular tool calling means you define tools inline in your API request and handle execution yourself. MCP standardizes the protocol so tools can be defined on separate servers and consumed by any MCP-compatible client. MCP is the ecosystem layer on top of the core tool-calling primitive.
If stop_reason: "max_tokens" fires mid-task, is the conversation broken?
Not necessarily, but you have to handle it. The model hit its context window limit before finishing. Common strategies include summarizing the context, trimming old messages, or using prompt caching to reduce token cost so you have more room. Ignoring it silently is the bug.
Does Claude Code use all five layers? Yes — that’s the point of L4. Claude Code is the four lower layers packaged into a terminal agent. It uses the model (L0), calls the Messages API (L1), has tools for file system, shell, and web access (L2), runs an agent loop (L3), and exposes itself as a surface you install (L4).
Where do prompts fit in the layer model? Prompts are a cross-cutting plane, not a layer. A system prompt shapes how the model interprets everything from L0 upward — it’s not isolated to one layer. Same with cost and reliability concerns: they show up as token counts at L0, caching decisions at L1, tool latency at L2, loop length at L3, and deployment choices at L4.
Where This Fits in the Series
This tutorial is the orientation episode for the How Claude Actually Works course. It gives you the vocabulary and the five-layer scaffold so that every subsequent episode has a place to land. Episode 1 goes down to L0 — the model itself — and subsequent episodes work their way back up through the stack. When Anthropic ships a new feature and you’re unsure where it fits, come back to this diagram. See all tutorials for the full series.
Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.
Subscribe on YouTube →