How to Guarantee JSON Output from Claude with Structured Outputs

June 23, 2026 · How Claude Actually Works (part 10)

▶ Watch on YouTube & subscribe to The Stack Underflow

If you have ever shipped a prompt that worked perfectly in testing and then silently exploded in production because the model prefaced its JSON with “Sure, here you go!” — this tutorial is for you. Getting an LLM to reliably return structured data is not about asking more politely. It is about removing the model’s ability to do anything else.

This video walks through the full journey from fragile string-parsing hope to a validated, typed object you can actually depend on — using Claude’s tool-call mechanism, Pydantic schemas, a bounded retry loop, and (as of 2026) Claude’s native output_config.format structured output API.

The one-sentence version: Instead of prompting the model to “return JSON,” force it to fill in a typed tool call — or use Claude’s native structured output — so the shape of the response is guaranteed by the API, not by the model’s mood.

Why “asking nicely” eventually fails

The transcript demonstrates this failure mode directly: fire the same polite prompt three times. The first two replies come back as clean JSON. Third call, the model gets chatty, wraps the JSON in a markdown fence, and adds a preamble. Your JSON parser explodes.

This is not a bug you can tune away with temperature. It is a property of autoregressive generation — the model samples the next token, and sometimes that next token is S (for “Sure”). You need a structural constraint, not a softer instruction.

The core pattern: schema → tool → forced call

The strategy has four stages:

Fragile prompt
    │
    ▼
Pydantic model  ──►  extract_invoice tool schema
                              │
                              ▼
                    tool_choice = { type: "tool", name: "extract_invoice" }
                              │
                              ▼
                    Tool use block (structured args only)
                              │
                              ▼
                    Schema validate gate (Pydantic recheck)
                              │
                         ┌────┴────┐
                      valid     invalid
                         │          │
                    typed Invoice   retry once (paste error as new user turn)

Step 1 — Start with a Pydantic model

Define your contract as a Pydantic model. For an invoice extractor, that looks roughly like:

from pydantic import BaseModel
from typing import Optional

class Invoice(BaseModel):
    invoice_number: str
    vendor: str
    total: float
    po_number: Optional[str] = None  # genuinely optional

Required fields are required. Optional fields are typed as Optional with a default of None. This distinction matters — the schema prevents the model from hallucinating a PO number when none exists in the document.

Step 2 — Morph the schema into a tool

Convert the Pydantic model into a tool definition named extract_invoice. The model’s job is no longer “write text” — it is “fill in this tool call.” The tool’s input schema is derived directly from the Pydantic model, so the shape is already specified precisely.

Step 3 — Force the tool call

Pass tool_choice with type: "tool" and the tool name. This is the key move. The model can no longer reply with prose; it can only emit structured tool arguments. The free-text escape hatch is closed.

response = client.messages.create(
    model="claude-opus-4-5",
    tools=[extract_invoice_tool],
    tool_choice={"type": "tool", "name": "extract_invoice"},
    messages=[{"role": "user", "content": raw_invoice_text}],
)

The response comes back as a tool_use block packed with values — not a string you have to parse.

Step 4 — Validate gate and one retry

Run the tool arguments back through Pydantic. If validation passes, you have a real typed Invoice object. If it fails, do not give up — but do not loop forever either. Catch the ValidationError, paste it back as a new user turn explaining what went wrong, and call again. Set a hard ceiling of one retry. A bad document should not burn 20 API calls.

The modern way: `output_config.format`

As of 2026, Claude has native structured outputs. You can skip the tool wrapper entirely:

response = client.messages.create(
    model="claude-opus-4-5",
    output_config={"format": {"type": "json_schema", "schema": invoice_json_schema}},
    messages=[{"role": "user", "content": raw_invoice_text}],
)

This gives you guaranteed JSON back directly, without wrapping your schema in a tool definition. You can also set strict=True on a tool to enforce its input schema. The forced tool-choice pattern still works and is valuable for understanding the underlying mechanism, but output_config.format is the direct modern approach. The old output_format parameter is deprecated — stop using it.

Optional fields vs. hallucination

A subtle but important point from the video: when you mark a field as optional in the schema, the model returns null when the value is absent. It does not make something up. This is the schema doing work that prompting alone cannot reliably do. “Do not hallucinate missing fields” is an instruction the model may ignore under the right conditions; Optional[str] = None in a strict schema is a constraint it cannot ignore.

Common misconceptions

“I can fix this with a better system prompt.” A well-crafted system prompt reduces the failure rate, but it cannot eliminate it. Structural constraints (tool choice, output_config.format) are categorical fixes; prompt wording is probabilistic.
“Temperature 0 guarantees consistent output format.” Temperature 0 makes the output deterministic (given the same prompt), but the deterministic output can still be a markdown-fenced JSON blob with a preamble. Format consistency requires format enforcement.
“I should retry until it works.” Unbounded retry loops are how a single bad document turns into a surprise bill. One retry is the right ceiling. If it fails twice, the document is likely ambiguous enough that a human should look at it.
“The tool call pattern is just a hack — native JSON mode is the real answer.” The tool-call pattern teaches the underlying idea clearly and still works in all versions of the API. Native output_config.format is cleaner for production, but understanding forced tool choice makes you better at debugging when things go wrong.

Frequently asked questions

What is the difference between tool_choice: auto and tool_choice: {type: "tool", name: "..."}? With auto, the model decides whether to call a tool or reply with text. With the explicit name form, the model is forced to call that specific tool. For structured output use cases you almost always want the explicit form.

Does output_config.format work with all Claude models? As of 2026, native structured outputs via output_config.format are supported on current Claude models. Check the Anthropic API changelog for the exact model version floor — older pinned model versions may not support it.

What happens if the document genuinely cannot satisfy the required fields? The model will attempt to fill required fields and may hallucinate if there is no value to extract. This is why the Pydantic validate gate matters: it catches structurally invalid output even if the model tried to comply. For truly missing required fields, consider making them optional and handling nulls downstream rather than expecting the model to invent data.

Should I use this pattern for all LLM calls, or only when I need JSON? Only when you need a specific structured shape. For open-ended generation, summarization, or conversation, forcing a tool call adds overhead with no benefit. Reserve structural enforcement for cases where your code parses the output programmatically.

Where this fits in the series

This tutorial is part of How Claude Actually Works, a course that builds up a complete mental model of Claude’s API — from tokenization and context windows through tool use, structured output, and multi-agent patterns. The invoice extractor covered here is explicitly called out as the capstone project for playlist 07 in the series. If you are following along in order, you have now seen the full structured-output pipeline end to end. Browse all tutorials to continue.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →