Claude's 5-Layer Stack: MCP, Hooks, Skills, and Subagents Explained

June 23, 2026 · How Claude Actually Works (part 2)

▶ Watch on YouTube & subscribe to The Stack Underflow

Every time Anthropic ships something new — MCP transports, agent hooks, skills, subagents — the discourse follows a familiar pattern: excited thread, a dozen confused replies, and nobody quite agreeing on where the thing fits. The reason is that there is no shared map. This tutorial gives you that map: five horizontal layers and two vertical planes that together account for every feature Anthropic has shipped or is likely to ship.

Once you have the grid in your head, placing a new feature takes about five seconds. Layer, plane, dependency. That’s the whole filter.

The one-sentence version: Claude’s feature surface organizes into five layers (tokens → API call → tools → agent loop → installed products) and two cross-cutting planes (prompts, cost/reliability), and knowing which slot a feature occupies tells you whether you need it.

The Five Layers

L4  Claude Code / Agent SDK / Managed Agents / Plugins
L3  Agent loop · Hooks · Subagents · Sessions · Memory
L2  Tool choice · Input schema · MCP · Transports
L1  messages.create · stop_reason · Streaming · max_tokens
L0  Tokens · Context window · Extended thinking · Tokenizer

L0 — The Model Substrate

This layer is facts about the model itself. Tokens are the unit the model sees. The context window is its working memory. Extended thinking controls how much the model reasons before answering. The tokenizer is the converter that turns your plain text into those tokens.

Nothing at L0 knows about tools or agents. If a feature is purely about what the model is or how it processes text, it lives here.

L1 — The Single API Call

messages.create is the endpoint. Everything at this layer is a knob on a single HTTP request:

Concept	What it controls
`messages.create`	The API endpoint itself
`stop_reason`	Why the model stopped (end_turn, tool_use, max_tokens…)
Streaming	Whether the response arrives incrementally
`max_tokens`	Cap on output length

Nothing here knows about agents or multi-step loops either. It’s one request, one response.

L2 — Reaching the World (Tools)

This is where the model gets hands. Three key concepts live here:

Tool choice — controls when the model is allowed to call a tool (auto, required, none, or a specific tool name).
Input schema — describes the shape of each tool call in JSON Schema so the model knows what arguments are valid.
MCP (Model Context Protocol) — the standardized protocol by which external tools expose themselves to the model. The transports are how those connections physically run: stdio for local servers, streamable HTTP for remote ones.

MCP is not a layer on its own — it’s a feature that lives at L2, sitting on top of the raw tool-call machinery at L1.

L3 — Turning Calls into Behavior (The Agent Loop)

A single tool call is L2. The loop that reads stop_reason, decides whether to keep going, fires the tool, and feeds the result back — that’s L3. This is where an API call becomes an agent that actually does work.

Key features at this layer:

Agent loop — reads stop_reason == "tool_use" and keeps the process alive.
Hooks — fire at lifecycle events: pre_tool_use, post_tool_use, stop, and others. Hooks let you intercept, log, or mutate what happens at each step without changing the core loop.
Subagents — get their own context window so the parent agent doesn’t accumulate noise from every sub-task. You spawn a subagent when a task is large enough to pollute the parent’s window.
Sessions — persist state across multiple turns within a conversation.
Memory — carries information across sessions entirely.

L4 — Things You Actually Install or Open

This is the layer packaged for humans and teams:

Claude Code — the terminal agent most developers encounter first.
CLAUDE.md — the project-level config file Claude Code reads at the start of every session. It lives in your repo, not in any API call.
Agent SDK — the library for building agents programmatically (renamed from the Claude Code SDK in late 2025).
Managed agents — Anthropic-hosted agent infrastructure.
Plugins — extend Claude Code by bundling skills, hooks, and MCP servers together into a single installable unit.

All of L4 is the layers below, packaged and productized.

The Two Cross-Cutting Planes

Some features are not a layer — they cut across all of them.

The Prompts Plane

Few-shot examples — shape how the model interprets every layer above L0. They’re not a layer; they’re a pattern applied anywhere.
JSON schema enforcement via fake tool call — forces the model to output a specific structure by using the tool-call mechanism as an output formatter. It sits across L1 and L2 simultaneously.

The Cost and Reliability Plane

Prompt caching — reduces cost by avoiding re-encoding the same prefix on every call.
Compaction — summarizes long histories to keep them fitting inside the context window.
Evals — measure whether any of this is actually working. Evals are not a product feature; they’re the feedback loop behind every layer.

Cost & Reliability Plane         Prompts Plane
│  caching · compaction · evals  │  few-shot · schema enforcement
│                                │
└── cuts vertically across L0–L4 ┘

Skills: A Feature in Two Places

Skills are the one feature that deliberately lives at two layers. At L4 they ship as a feature of Claude Code — you install a plugin, get skills. At L3 they’re available via the Agent SDK, where you call them programmatically from your own agent loop.

Same feature, two valid homes. The rule here: name both, pick the one that matches your deployment context.

The Filter: Three Questions for Every New Release

The practical payoff of having this map is a decision filter. Next time something new ships and your feed lights up:

Which layer? Place the feature on the grid.
Which plane? Does it affect cost, reliability, or prompting behavior?
Dependency? Does it depend on something you’re already using?

If the answer to question 3 is no, you probably don’t need it yet. The stack is a filter before it’s a feature list.

Common Misconceptions

“MCP is its own layer.” MCP is a protocol feature at L2. It standardizes how tools expose themselves — it doesn’t add a new abstraction layer on top of the tool-call mechanism.
“Hooks are part of the model.” Hooks live at L3, in the agent loop runtime. The model itself (L0) has no concept of hooks; they’re lifecycle callbacks in the orchestration layer around it.
“Claude.md is an API feature.” CLAUDE.md is an L4 artifact — it’s Claude Code’s project-level config file. It doesn’t appear anywhere in the raw API.
“Skills and plugins are the same thing.” Plugins are installable bundles (L4) that can contain skills, hooks, and MCP servers. Skills are callable units of behavior that can surface at L3 (programmatic) or L4 (Claude Code). A plugin may include skills, but they’re not synonyms.

Frequently Asked Questions

Where does prompt caching fit — is it an L1 feature? It’s in the cost and reliability plane, not a layer. Prompt caching is a cross-cutting concern that operates by marking a prefix at the API level (L1), but its purpose is economic and applies across your whole stack. Think of it as a vertical slice, not a horizontal layer.

If subagents get their own context window, are they separate API calls? Yes. Spawning a subagent means issuing a fresh messages.create call with its own context. The parent agent delegates a task, the subagent runs independently, and the result comes back to the parent. The isolation is the point — it prevents the subagent’s tool-call noise from filling the parent’s window.

Do I need to understand all five layers to use Claude Code? No. Claude Code (L4) packages the layers below so you can use it without thinking about them. But when something breaks — a tool call misbehaves, a hook fires unexpectedly, memory doesn’t persist — knowing which layer is involved cuts debugging time dramatically.

Is the Agent SDK the same as the Claude Code SDK? They are the same thing under different names. Anthropic renamed the Claude Code SDK to the Agent SDK in late 2025 to better reflect that it’s a general-purpose library for building agents at L3, not just a companion to the Claude Code terminal product.

Where This Fits in the Series

This tutorial is the orientation episode for the “How Claude Actually Works” course — the map you’ll reference as the series goes deeper into each layer. The next episode descends to L0: the model itself, tokens, and the context window. Having the full grid in mind first makes each deep-dive land in its proper place rather than floating in isolation. Browse all tutorials to see the full course structure.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →