Understanding stop_reason in the Claude Messages API

June 23, 2026 · How Claude Actually Works (part 5)

▶ Watch on YouTube & subscribe to The Stack Underflow

Every Claude API integration, from a one-shot chatbot to a multi-step autonomous agent, is built on a single call: messages.create. That call returns two things — content and stop_reason. Most developers pay close attention to content. The ones who build reliable agents pay equally close attention to stop_reason.

This tutorial covers what stop_reason means, what all six of its documented values require you to do, and the anti-patterns that cause production agents to go off the rails.

The one-sentence version: stop_reason is the enum field that tells you why Claude stopped generating — and every control-flow decision in your agent loop must branch on it, never on the text content.

The protocol in one paragraph

The messages.create call is the foundation everything else rests on. You send it a model, a messages array, and optionally a list of tools. Back comes a response object. Two fields matter most:

  • content — an array that can contain text blocks or tool_use blocks.
  • stop_reason — a string enum telling you why the model stopped producing output.

That’s the whole protocol at layer one. Everything an agent framework does is just a structured loop around this call.

Why you must never branch on the content text

Here is the classic anti-pattern:

response = client.messages.create(model="claude-opus-4-5", messages=messages)
text = response.content[0].text

if "all done" in text.lower():
    mark_task_complete()   # ← this will bite you in production

Checking prose for completion signals is fragile. The model might phrase things differently, use a different language, or include a summary sentence that looks like a terminator but isn’t. Branch on stop_reason. Always.

response = client.messages.create(model="claude-opus-4-5", messages=messages)

match response.stop_reason:
    case "end_turn":
        return response.content   # clean finish, return to user
    case "tool_use":
        results = run_tools(response.content)
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": results})
        # call again ↑
    case "max_tokens":
        # output was cut mid-thought — continue the turn
        ...
    case "stop_sequence":
        # your custom stop token fired — planned terminator
        ...
    case "pause_turn":
        # Anthropic paused a long-running turn — resume it
        ...
    case "refusal":
        # model declined on safety grounds — surface to user
        ...
    case _:
        # default branch: Anthropic may add values at any time
        raise UnknownStopReasonError(response.stop_reason)

All six stop_reason values

ValueWhat it meansWhat you do
end_turnModel finished cleanlyReturn content to the user. Loop ends.
tool_useModel wants to call a toolExecute the tool, append the result, call the API again.
max_tokensOutput was cut mid-thoughtContinue the turn — call the API again. Do not treat as done.
stop_sequenceYour custom stop token was hitHandle as a planned, clean terminator.
pause_turnAnthropic paused a long-running turn server-sideNot an error. Resume by sending the assistant response back as the next request.
refusalModel declined on safety groundsDo not retry blindly. Surface it gracefully to the user.

end_turn

This is the happy path. The model finished what it was saying, decided it was done, and handed control back to you. Return the content to the user and exit the loop.

tool_use

This is the most important value for agent builders. When you see tool_use, the content array will contain one or more tool_use blocks, each with a name, an id, and input. Your job:

  1. Execute whatever the model asked for.
  2. Append the assistant’s response to the messages array.
  3. Append a user message containing your tool_result blocks (keyed to the same id).
  4. Call messages.create again.

That loop — call → tool → append → call — is the seed from which every agent is grown.

messages.create


  stop_reason == "tool_use"?

      ├─ yes → run tools → append result → messages.create (again)
      │                                          ▲
      └─────────────────────────────────────────┘

max_tokens

The model ran out of output budget mid-generation. The response content is incomplete — do not treat it as a finished answer. Continue the turn by calling the API again with the partial assistant response appended, exactly as you would for tool_use. The shape of the loop is the same; the reason is different.

stop_sequence

You asked for this. A stop sequence is a token you tell the API to treat as an output terminator. When it fires, the output is cleanly cut at that boundary. Handle it as a planned event, not an error.

pause_turn (trap #1)

As of March 2026, Anthropic enforces a server-side ceiling of roughly 20 tool calls per turn. If your agent hits that ceiling, the API returns pause_turn instead of an error. This is not a failure — it is a checkpoint. Resume by sending the current assistant response back as the next request. An agent loop that treats pause_turn as a terminal condition will silently abandon work mid-task.

refusal (trap #2)

The model declined to continue on safety grounds. On Opus 4.7 and later, a refusal also includes a stop_details object that names the specific policy category that was triggered. That category is actionable:

  • Some categories suggest retrying with a rephrased or softer prompt.
  • Others indicate the request is firmly blocked and you should surface a clear message to the user.

Do not retry a refusal blindly. Read the category.

The default branch you must not skip

Anthropic can — and does — add new stop_reason values. Your switch/match statement needs a default (or case _:) branch. Code that crashes on an unknown string value is not robust; code that raises a typed UnknownStopReasonError and logs the value gives you something to act on.

Common misconceptions

  • “I can just check if the content text says ‘done’.” No. Text output is not a control signal. stop_reason is. Prose is for users; enum values are for code.
  • pause_turn means something went wrong.” It does not. It means Anthropic paused a long-running turn to manage server resources. You resume it; you do not retry from scratch.
  • max_tokens means the model gave a complete answer that just happened to be long.” Wrong. max_tokens means the output was cut. You need to continue the turn to get the rest.
  • “Refusals are transient errors I should retry in a loop.” Retrying blindly can worsen the situation and wastes quota. Read stop_details (where available) to decide whether a retry even makes sense.

Frequently asked questions

What happens if I append a tool_result but forget to call the API again? Your agent silently stops. From the user’s perspective the task just hangs or returns nothing. The loop only continues if you issue another messages.create. The API is stateless — it does not call you back.

How do I handle max_tokens without duplicating the content? Append the truncated assistant response to messages exactly as it is, then call the API again with no new user message. The model will pick up where it left off. Some implementations add a continue user message for clarity, but the assistant response alone is sufficient.

Can stop_reason ever be null? In practice, the API always returns a non-null stop_reason for completed requests. If you see null it is a sign you are reading from a streaming event before the final message_delta has arrived — in which case you are reading the wrong event.

Is the 20 tool-call ceiling per turn or per session? Per turn. A turn is a single call to messages.create that results in the model generating a response. After a pause_turn resume, the ceiling resets for the new turn.

Where this fits in the series

This tutorial covers layer one of the agent stack — the core messages.create protocol and the stop_reason field that governs all control flow. The “How Claude Actually Works” course builds upward from here: the next episode introduces layer two (tool reach), and the one after that shows how the tool_use loop scales into a full agent. Understanding stop_reason cold is the prerequisite for everything that follows. Browse all tutorials to see the full sequence.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →