How Claude Code's Agent Loop Works (and Why It Breaks)

June 23, 2026 · How Claude Actually Works (part 8)

▶ Watch on YouTube & subscribe to The Stack Underflow

If you have watched the earlier episodes in this series you already know that each call to messages.create returns a stop_reason — a signal telling you why the model stopped generating. That was a per-call decision. This episode zooms out one level: what happens when you wrap that single call inside a loop so the model can keep going until a task is fully finished? That structure is the agent loop, and it is both the engine that powers Claude Code and the source of nearly every runaway-bill horror story you have heard.

This tutorial traces the loop’s happy path, then methodically breaks it in four ways, each time showing the fix.

The one-sentence version: The agent loop is a while cycle that calls the model, runs whatever tool it requested, and only exits when the model says end_turn — and without explicit guardrails, that loop can spin forever.

The Basic Loop

The loop has exactly three moving parts:

1. Call messages.create
2. Inspect stop_reason
   ├─ "tool_use"  → execute the tool, append result, GOTO 1
   └─ "end_turn"  → return result to user ✓

As an ASCII diagram:

┌─────────────────────────────────┐
│         messages.create         │
└──────────────┬──────────────────┘
               │ response
     ┌─────────▼──────────┐
     │  stop_reason?       │
     └──┬─────────────┬───┘
        │ tool_use    │ end_turn
        ▼             ▼
   run tool        exit → user ✓
   append result

        └──────────────────────► (back to top)

A healthy run looks like this: the model loops three times, calling tools on each pass (costing a little bit of money on each round trip), then reaches end_turn and exits cleanly. Cost stays small and predictable. That is what working looks like.

Four Ways the Loop Breaks

The video walks through four distinct failure modes, each with a concrete fix. Here they are in order.

1. Infinite Loop — No Exit Condition

What happens: The model never emits end_turn. The loop counter spins past 20, 50, 100. The cost meter turns red. This is the classic “agent left running overnight” scenario.

Why it happens: Nothing in the code forces an exit. If the model gets confused, miscounts steps, or hits a tool that always returns ambiguous results, it just… keeps going.

The fix — an iteration ceiling:

MAX_ITERATIONS = 25
iteration = 0

while True:
    response = client.messages.create(...)
    iteration += 1

    if iteration >= MAX_ITERATIONS:
        return "Iteration cap reached. Returning to user."

    if response.stop_reason == "end_turn":
        break
    elif response.stop_reason == "tool_use":
        # run tool, append result
        ...

The ceiling is non-negotiable. It does not matter how confident the model’s text sounds — the cap comes first, always.

2. Premature Exit — Model Says “Done,” Task Isn’t

What happens: The model emits end_turn after a single lap and announces the task is complete. But the checklist of required deliverables still has unchecked boxes.

Why it happens: The model’s output is a suggestion, not ground truth. It might genuinely believe it finished, or it might be pattern-matching on phrases it has seen in training data that signal completion. The text “All set!” tells you nothing about whether work was actually done.

The fix — explicit done criteria:

“Branch on what has actually been done, not what the model claims is done.”

Maintain a task state object (a checklist, a set of required keys, a database row) that is updated by your code — not by parsing the model’s prose. Only treat the task as complete when that state object says so.

required = {"fetched_data", "validated_schema", "wrote_output"}
completed = set()

# After each tool result, update completed:
# completed.add("fetched_data")

if response.stop_reason == "end_turn":
    if completed >= required:
        break  # genuinely done
    else:
        # inject a reminder and keep looping
        messages.append({"role": "user", "content": "Not all steps are complete yet. Continue."})

3. Thrashing — Bouncing Between Tools Without Progress

What happens: The model alternates between two tools (say, search and lookup) repeatedly without making forward progress toward end_turn. Each round trip costs money; nothing gets done.

Why it happens: Vague tool descriptions make two tools look interchangeable. The model tries one, gets a partial answer, tries the other, gets another partial answer, and oscillates because it cannot decide which is authoritative.

The fix — sharp, unambiguous tool descriptions:

VagueSharp
search: searches for informationsearch: full-text search over the knowledge base; use when you need to find documents by keyword
lookup: looks up a recordlookup: fetch a single record by its exact UUID; use only when you already have the ID

When descriptions clearly differentiate when to use each tool, the model picks the right one the first time and moves on.

4. Too Many Tools — Decision Paralysis

Related to thrashing: if a single agent has access to fifteen tools, the model spends cognitive budget deciding among them instead of doing the task. The video’s rule is straightforward: fewer tools per agent, sharper choices. Split a large tool surface across specialized sub-agents if needed.

The Four Controls, Summarized

ControlWhat it preventsWhere it lives
Iteration ceilingInfinite loops / runaway costCode — a counter with a max
Explicit done criteriaPremature exits / false completionCode — a task state object
Tight tool descriptionsThrashing between similar toolsTool definition text
Scoped tools per agentDecision paralysisAgent architecture

Notice that all four of these are code-level guarantees, not prompting tricks. You cannot reliably prevent a runaway loop with a sentence like “stop when you are finished.” These controls belong in your scaffolding, not your system prompt.

Cost Compounds in Multi-Agent Systems

The video ends with a cost warning worth keeping in mind as the series progresses:

  • A single agent loop: baseline cost
  • A multi-agent team (4–7 agents): roughly 4–7× the baseline
  • A coordinated agent team per Anthropic’s documentation: roughly 15× the baseline

This is not a reason to avoid multi-agent architectures — it is a reason to instrument them. Keep the cost meter visible. Episodes 0303 and 0304 in this series cover spawning sub-agents and coordinating them; these cost ratios will matter when you get there.

Common Misconceptions

  • “If the model says it is done, it is done.” No. The model’s text is a suggestion. Your task state is the truth. Always track actual completion separately from the model’s narration.
  • “A high iteration cap is safer than a low one.” A higher cap just means you burn more money before the fail-safe kicks in. Start with a conservative ceiling and raise it only when you have evidence you need to.
  • “Prompt engineering can replace code-level guardrails.” Prompts are hints, not contracts. Iteration caps and done-criteria checks need to live in code where they are enforced unconditionally.
  • “Thrashing means the model is broken.” Usually it means the tool descriptions are ambiguous. Fix the descriptions before blaming the model.

Frequently Asked Questions

What is a good default iteration cap? It depends on your task complexity, but starting around 20–25 iterations is common for single-step coding or research tasks. Monitor real runs and adjust. If legitimate tasks routinely hit the ceiling, raise it — but always have a ceiling.

How do I track task state without over-engineering it? For simple tasks, a Python set of completed step names is enough. For production workflows, use a database row or a structured object that your tool-execution code updates after each successful tool call. The key principle: the state update happens in your code, not by parsing the model’s response text.

Why does multi-agent cost multiply so fast? Each sub-agent runs its own loop, which means each sub-agent makes multiple model calls. If a coordinator calls four workers and each worker averages five model calls, that is already 20+ API calls for one top-level task. Context windows also grow as tool results accumulate, which increases per-call token costs.

Can the loop be paused and resumed? Yes, and that is often a good idea for long-running tasks. Serialize the messages array (which contains the full tool-call history) to storage after each iteration. On resume, reload it and continue. The model has no persistent state between API calls — the message history is the state.

Where This Fits in the Series

This episode is lesson 8 of the “How Claude Actually Works” course. It builds directly on the stop_reason branching covered in the playlist 6 episode and sets the stage for the multi-agent coordination episodes (0303, 0304) coming later in the series. If you are watching in order, the next episode covers hooks — another code-level control that runs around the agent loop. Browse all tutorials to see the full course outline.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →