How Claude Tool Calling Actually Works: The Request-Execute Model

June 23, 2026 · How Claude Actually Works (part 6)

▶ Watch on YouTube & subscribe to The Stack Underflow

When developers first integrate Claude with external tools, there is a mental model that almost everyone gets wrong: they picture the model reaching out to an API, running a bash command, or querying a database. It does not. Claude is a sealed box. No network, no file system, no side effects — ever. What the model does do is ask, precisely and according to a schema you gave it, for something to be done on its behalf.

Understanding that distinction changes how you design integrations. The model is brilliant at reasoning about what to do next; it relies entirely on your code (or Anthropic’s servers) to actually do it.

The one-sentence version: Claude never executes a tool — it emits a structured request that something outside the model executes, then consumes the result in a follow-up API call.

The sealed box and the dotted line

Picture two halves separated by a dotted line that never moves:

┌──────────────────────────────────┐
│         Claude (the model)       │
│  reasons, decides, emits blocks  │
└────────────────┬─────────────────┘
                 │  tool_use block (a request)
                 ▼
- - - - - - - - - - - - - - - - - -   ← dotted line
                 │
                 ▼
┌──────────────────────────────────┐
│    The executor (your code or    │
│    Anthropic's servers)          │
│  actually runs things            │
└────────────────┬─────────────────┘
                 │  tool_result block
                 ▼
         appended to messages
         → new messages.create call

Everything Claude “does” with tools is on the top half. The bottom half is where work happens. The arrow only goes down as a request and only comes back up as a result.

What you hand the model: tool definitions

When you call the API, you pass a list of tool definitions. Each definition has four key fields:

Field	What it does
`name`	Identifier used in the emitted `tool_use` block
`description`	What the model actually reads to decide which tool to use
`input_schema`	JSON Schema describing the required parameters
`strict`	When `true`, Anthropic compiles the schema into a grammar that constrains the decoder

Two things worth burning into memory here:

Description quality is the real selector. The model picks a tool based on the description, not the name. Write it like you would write documentation for a junior engineer who has never seen your codebase. Ambiguous descriptions produce wrong tool calls. This matters more than how many tools you expose.

strict: true is free correctness. When you set strict: true, the JSON Schema is compiled into a constrained grammar at the decoder level. The model literally cannot output "2" when your schema expects an integer 2. Required fields cannot be omitted. This is not validation after the fact — it is enforced during generation.

The three executor lanes

Once Claude emits a tool_use block, your code routes it to an executor. There are three distinct lanes:

Lane A — User-defined tools

The tools you built. get_weather, send_slack_message, query_orders. You wrote the schema, you wrote the handler. This is what most tutorials show and stop at.

Lane B — Anthropic-schema client tools

Tools like bash, text_editor, computer, and memory. Anthropic publishes the schema so the model is trained to call them reliably, but your code still runs the command. The bash command executes in your environment, not Anthropic’s. You own the security boundary.

Lane C — Anthropic-hosted server tools

The newest category: web_search, web_fetch, code_execution, and tool_search. Here, Anthropic’s own servers execute the action. Your code never sees the raw outbound request — you only see the tool_result that comes back. The model still did not do anything; it asked, and Anthropic’s infrastructure did it.

The lane that ran does not change the message flow. Every lane wraps its output in a tool_result block and sends it back.

The round trip in detail

1. You call messages.create with tools[] + user message
2. Claude emits stop_reason: "tool_use" + a tool_use block
3. Your code routes to the right executor (Lane A / B / C)
4. Executor runs; result is wrapped in a tool_result block
5. You append both the tool_use and tool_result to messages[]
6. You call messages.create again (same conversation, new turn)
7. Claude now has the result; emits stop_reason: "end_turn" + final text

For a multi-step agent, steps 2–6 repeat in a loop. Each iteration is one tool call. The model decides at each turn whether to call another tool or to stop.

How many tools can you expose?

The older guidance was to keep it to four or five tools. That was overcautious for current models. The video is clear on what the real constraint is: each tool needs a description so unambiguous that the model could not reasonably pick the wrong one. Tool count is secondary; description quality is the gate. If you have ten tools with crisp, non-overlapping descriptions, that is safer than three tools with vague ones.

Common misconceptions

“Claude calls my API directly.” No. Claude emits a structured block requesting a call. Your code makes the actual network request.
“Setting strict: false is fine, I’ll validate afterwards.” You can, but strict: true catches schema violations at generation time for zero extra cost. Use it by default.
“Lane C tools mean Anthropic can see my data.” For Lane C tools like web_search, the data being fetched is public web content. The architecture is clear: Claude asks, Anthropic’s servers fetch, you see the result — but your private data never flows out unless you explicitly send it.
“More tools means a smarter agent.” More tools with weak descriptions means more misrouted tool calls. Description quality scales; quantity alone does not.

Frequently asked questions

Why does the model need a second messages.create call after a tool runs? Because the model has no memory between calls — it only knows what is in the messages array. After your code runs the tool, the result does not exist anywhere the model can see it until you append the tool_result block and make a fresh call. Each call is stateless; the conversation state lives in your messages list.

What happens if my tool handler throws an exception? The API does not know or care. You are responsible for returning a tool_result block regardless. Best practice: catch exceptions in your handler and return an error description inside the tool_result content. This lets the model decide whether to retry, use a different tool, or tell the user something went wrong — rather than the conversation silently stalling.

Can Claude call multiple tools in one turn? Yes. Claude can emit multiple tool_use blocks in a single response (when the model determines they are independent). Your code can run them in parallel, then return all the tool_result blocks together in one follow-up call. This is how you avoid unnecessary sequential latency in agents.

What is MCP and why was it mentioned at the end? MCP (Model Context Protocol) is a standard for defining and sharing tool schemas so you are not reinventing the same tool definitions by hand across every project. Instead of writing bespoke JSON Schema for read_file or web_search in every codebase, MCP provides a shared, versioned spec. The next video in the series covers it.

Where this fits in the series

This tutorial is part of How Claude Actually Works — a course that builds a mechanistic understanding of how Claude reasons, encodes context, and integrates with the world. Earlier episodes covered tokenization, context windows, and the inference loop. This episode focuses on the tool-calling layer: the exact boundary between what the model does and what your code does. The next episode moves up the stack to MCP, showing how tool definitions can be standardized across teams and systems. Browse all tutorials to follow the full series.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →