How Claude Code's Agent Loop Works (and Why It Breaks)
▶ Watch on YouTube & subscribe to The Stack Underflow
If you have watched the earlier episodes in this series you already know that each call to messages.create returns a stop_reason — a signal telling you why the model stopped generating. That was a per-call decision. This episode zooms out one level: what happens when you wrap that single call inside a loop so the model can keep going until a task is fully finished? That structure is the agent loop, and it is both the engine that powers Claude Code and the source of nearly every runaway-bill horror story you have heard.
This tutorial traces the loop’s happy path, then methodically breaks it in four ways, each time showing the fix.
The one-sentence version: The agent loop is a
whilecycle that calls the model, runs whatever tool it requested, and only exits when the model saysend_turn— and without explicit guardrails, that loop can spin forever.
The Basic Loop
The loop has exactly three moving parts:
1. Call messages.create
2. Inspect stop_reason
├─ "tool_use" → execute the tool, append result, GOTO 1
└─ "end_turn" → return result to user ✓
As an ASCII diagram:
┌─────────────────────────────────┐
│ messages.create │
└──────────────┬──────────────────┘
│ response
┌─────────▼──────────┐
│ stop_reason? │
└──┬─────────────┬───┘
│ tool_use │ end_turn
▼ ▼
run tool exit → user ✓
append result
│
└──────────────────────► (back to top)
A healthy run looks like this: the model loops three times, calling tools on each pass (costing a little bit of money on each round trip), then reaches end_turn and exits cleanly. Cost stays small and predictable. That is what working looks like.
Four Ways the Loop Breaks
The video walks through four distinct failure modes, each with a concrete fix. Here they are in order.
1. Infinite Loop — No Exit Condition
What happens: The model never emits end_turn. The loop counter spins past 20, 50, 100. The cost meter turns red. This is the classic “agent left running overnight” scenario.
Why it happens: Nothing in the code forces an exit. If the model gets confused, miscounts steps, or hits a tool that always returns ambiguous results, it just… keeps going.
The fix — an iteration ceiling:
MAX_ITERATIONS = 25
iteration = 0
while True:
response = client.messages.create(...)
iteration += 1
if iteration >= MAX_ITERATIONS:
return "Iteration cap reached. Returning to user."
if response.stop_reason == "end_turn":
break
elif response.stop_reason == "tool_use":
# run tool, append result
...
The ceiling is non-negotiable. It does not matter how confident the model’s text sounds — the cap comes first, always.
2. Premature Exit — Model Says “Done,” Task Isn’t
What happens: The model emits end_turn after a single lap and announces the task is complete. But the checklist of required deliverables still has unchecked boxes.
Why it happens: The model’s output is a suggestion, not ground truth. It might genuinely believe it finished, or it might be pattern-matching on phrases it has seen in training data that signal completion. The text “All set!” tells you nothing about whether work was actually done.
The fix — explicit done criteria:
“Branch on what has actually been done, not what the model claims is done.”
Maintain a task state object (a checklist, a set of required keys, a database row) that is updated by your code — not by parsing the model’s prose. Only treat the task as complete when that state object says so.
required = {"fetched_data", "validated_schema", "wrote_output"}
completed = set()
# After each tool result, update completed:
# completed.add("fetched_data")
if response.stop_reason == "end_turn":
if completed >= required:
break # genuinely done
else:
# inject a reminder and keep looping
messages.append({"role": "user", "content": "Not all steps are complete yet. Continue."})
3. Thrashing — Bouncing Between Tools Without Progress
What happens: The model alternates between two tools (say, search and lookup) repeatedly without making forward progress toward end_turn. Each round trip costs money; nothing gets done.
Why it happens: Vague tool descriptions make two tools look interchangeable. The model tries one, gets a partial answer, tries the other, gets another partial answer, and oscillates because it cannot decide which is authoritative.
The fix — sharp, unambiguous tool descriptions:
| Vague | Sharp |
|---|---|
search: searches for information | search: full-text search over the knowledge base; use when you need to find documents by keyword |
lookup: looks up a record | lookup: fetch a single record by its exact UUID; use only when you already have the ID |
When descriptions clearly differentiate when to use each tool, the model picks the right one the first time and moves on.
4. Too Many Tools — Decision Paralysis
Related to thrashing: if a single agent has access to fifteen tools, the model spends cognitive budget deciding among them instead of doing the task. The video’s rule is straightforward: fewer tools per agent, sharper choices. Split a large tool surface across specialized sub-agents if needed.
The Four Controls, Summarized
| Control | What it prevents | Where it lives |
|---|---|---|
| Iteration ceiling | Infinite loops / runaway cost | Code — a counter with a max |
| Explicit done criteria | Premature exits / false completion | Code — a task state object |
| Tight tool descriptions | Thrashing between similar tools | Tool definition text |
| Scoped tools per agent | Decision paralysis | Agent architecture |
Notice that all four of these are code-level guarantees, not prompting tricks. You cannot reliably prevent a runaway loop with a sentence like “stop when you are finished.” These controls belong in your scaffolding, not your system prompt.
Cost Compounds in Multi-Agent Systems
The video ends with a cost warning worth keeping in mind as the series progresses:
- A single agent loop: baseline cost
- A multi-agent team (4–7 agents): roughly 4–7× the baseline
- A coordinated agent team per Anthropic’s documentation: roughly 15× the baseline
This is not a reason to avoid multi-agent architectures — it is a reason to instrument them. Keep the cost meter visible. Episodes 0303 and 0304 in this series cover spawning sub-agents and coordinating them; these cost ratios will matter when you get there.
Common Misconceptions
- “If the model says it is done, it is done.” No. The model’s text is a suggestion. Your task state is the truth. Always track actual completion separately from the model’s narration.
- “A high iteration cap is safer than a low one.” A higher cap just means you burn more money before the fail-safe kicks in. Start with a conservative ceiling and raise it only when you have evidence you need to.
- “Prompt engineering can replace code-level guardrails.” Prompts are hints, not contracts. Iteration caps and done-criteria checks need to live in code where they are enforced unconditionally.
- “Thrashing means the model is broken.” Usually it means the tool descriptions are ambiguous. Fix the descriptions before blaming the model.
Frequently Asked Questions
What is a good default iteration cap? It depends on your task complexity, but starting around 20–25 iterations is common for single-step coding or research tasks. Monitor real runs and adjust. If legitimate tasks routinely hit the ceiling, raise it — but always have a ceiling.
How do I track task state without over-engineering it?
For simple tasks, a Python set of completed step names is enough. For production workflows, use a database row or a structured object that your tool-execution code updates after each successful tool call. The key principle: the state update happens in your code, not by parsing the model’s response text.
Why does multi-agent cost multiply so fast? Each sub-agent runs its own loop, which means each sub-agent makes multiple model calls. If a coordinator calls four workers and each worker averages five model calls, that is already 20+ API calls for one top-level task. Context windows also grow as tool results accumulate, which increases per-call token costs.
Can the loop be paused and resumed?
Yes, and that is often a good idea for long-running tasks. Serialize the messages array (which contains the full tool-call history) to storage after each iteration. On resume, reload it and continue. The model has no persistent state between API calls — the message history is the state.
Where This Fits in the Series
This episode is lesson 8 of the “How Claude Actually Works” course. It builds directly on the stop_reason branching covered in the playlist 6 episode and sets the stage for the multi-agent coordination episodes (0303, 0304) coming later in the series. If you are watching in order, the next episode covers hooks — another code-level control that runs around the agent loop. Browse all tutorials to see the full course outline.
Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.
Subscribe on YouTube →