Building a Multi-Agent Research System with Isolated Contexts

June 23, 2026 · How Claude Actually Works (part 29)

▶ Watch on YouTube & subscribe to The Stack Underflow

When a single agent tries to be a one-stop research shop — ingesting 40 sources, running 12 tool calls, juggling primary evidence, recent news, and counterarguments — it starts to contradict itself. The context window fills up, earlier facts get crowded out, and the final output reflects whatever happened to float near the top of the window rather than what was actually most important.

The fix is not a bigger context window. It is a better architecture: a coordinator that delegates, sub-agents with sealed context bubbles, scoped tools, and a synthesis step whose only job is to compose — not to fetch anything new.

The one-sentence version: Split your research work across isolated sub-agents with scoped tools, then funnel their structured, citation-tagged results into a synthesis agent that can only compose — never fetch — to get a report whose provenance you can actually prove.

The Problem: Context Flooding in a Single Agent

Imagine asking one agent to research a complex topic end-to-end. It searches, fetches pages, reads PDFs, evaluates counterarguments, and then tries to write a coherent report. By the time it is writing, the context window looks like a junk drawer: dead-end searches, half-parsed documents, redundant fetches, intermediate notes. The model’s own earlier output is buried under all of it.

At 40 sources and 12 tool outputs, contradictions start appearing — not because the underlying sources disagree (though they might), but because the model can no longer reliably attend to all of its own prior reasoning simultaneously. The output quality degrades in proportion to how full the context is.

The Architecture: Coordinator + Three Sub-Agents + Synthesizer

The solution is a four-component pipeline:

              ┌─────────────────┐
              │   Coordinator   │
              └───────┬─────────┘
          ____________|____________
         |            |            |
    ┌────▼────┐  ┌────▼────┐  ┌───▼─────┐
    │ Agent A │  │ Agent B │  │ Agent C │
    │ Primary │  │ Recent  │  │ Counter │
    │ Sources │  │  News   │  │  Args   │
    └────┬────┘  └────┬────┘  └───┬─────┘
         |____________|____________|
                      |
              ┌───────▼─────────┐
              │  Synthesizer    │
              │ (compose only)  │
              └─────────────────┘

Coordinator — sits at the top, dispatches work to the three sub-agents, and feeds their structured outputs into the synthesis step. It does not itself do research.

Sub-agents A, B, C — each tackles a different angle of the question simultaneously:

  • Agent A: primary sources and foundational evidence
  • Agent B: recent news and developments
  • Agent C: counterarguments and critiques

They run in parallel, which means the total wall-clock time is bounded by the slowest agent, not the sum of all three.

Synthesizer — receives the distilled outputs from all three sub-agents and composes the final report. Critically, its search and fetch tools are grayed out — disabled. It cannot go fetch new claims of its own. It can only compose from what it was given.

Isolated Context Bubbles

Each sub-agent operates inside its own context bubble. The messy intermediate work — searches that led nowhere, pages that turned out to be irrelevant, tool calls that returned noise — stays sealed inside that bubble. The sub-agent processes it, extracts what matters, and returns only a distilled result.

This means the coordinator’s context never gets polluted with the raw intermediary chaos. Only the summarized, structured output from each sub-agent flows upward. The coordinator’s window stays clean enough to reason reliably.

LayerWhat it seesWhat it does NOT see
CoordinatorSub-agent distilled outputsRaw searches, fetches, dead ends
Sub-agentIts own full contextOther sub-agents’ work
SynthesizerStructured claim+source mappingsAnything outside what coordinator sent

Scoped Tools Per Role

Tool access is deliberately scoped by role. Researchers (search, fetch) can go out to the web and retrieve documents. The synthesizer cannot. This is not an oversight — it is an intentional constraint.

If the synthesizer could fetch new sources, it might quietly introduce claims that were never verified by the sub-agents, breaking the provenance chain. By making it compose-only, you guarantee that everything in the final report traces back to something a sub-agent explicitly retrieved and cited.

// Sub-agent tool config
{
  "tools": ["search", "fetch"]
}

// Synthesizer tool config
{
  "tools": []
}

Simple, but consequential.

Provenance-Preserving Outputs

When a sub-agent returns its result, the payload is structured — not free-form prose. Each claim is paired with its source: a claim-to-source mapping. This is the provenance contract between layers.

{
  "claims": [
    {
      "claim": "Constitutional AI was introduced in 2022.",
      "source": "https://arxiv.org/abs/2212.08073",
      "quote": "We call this method 'Constitutional AI'..."
    }
  ]
}

The coordinator feeds these mappings into the synthesizer. The synthesizer weaves them into prose while preserving inline citations. The final report carries citations, and every citation traces back to the original source the sub-agent actually saw. Provenance survives synthesis — it does not get laundered away in the summarization step.

Verifying the Isolation

You can actually prove that the isolation held. Scan the coordinator’s messages array and assert that zero sub-agent tool-use blocks appear in it. If a sub-agent’s raw tool_use or tool_result blocks are leaking into the coordinator’s context, the architecture is broken.

def assert_no_subagent_tools_in_coordinator(coordinator_messages):
    for msg in coordinator_messages:
        if isinstance(msg.get("content"), list):
            for block in msg["content"]:
                assert block.get("type") not in ("tool_use", "tool_result"), \
                    f"Sub-agent tool block leaked into coordinator context: {block}"
    print("Green check — bubbles stayed sealed.")

This is not just a nice-to-have. If this assertion fails, the whole isolation argument falls apart and you are back to the single-agent context flooding problem, just with extra steps.

Common Misconceptions

  • “A bigger context window solves this.” A larger window buys time but does not fix the underlying attention and contradiction problem. Structured delegation does.
  • “The synthesizer should be able to look things up if it needs to.” Giving the synthesizer fetch access breaks provenance. If it adds claims you cannot trace, the citation chain is unverifiable. Keep it compose-only.
  • “Parallel sub-agents are more expensive.” They cost roughly the same tokens as sequential execution, but they complete faster and each agent works on a narrower, cleaner context — which actually reduces hallucination risk.
  • “Structured outputs from sub-agents are unnecessary overhead.” Free-form prose between agents is where provenance dies. The claim-to-source mapping is the data contract that makes the whole pipeline auditable.

Frequently Asked Questions

How do I pick what each sub-agent researches? Partition by information type, not by topic. Primary sources, recent developments, and counterarguments are a natural split because they require different retrieval strategies and have different freshness requirements. If your topic has other natural dimensions — geographic regions, stakeholder groups, time periods — those can work too.

What happens if a sub-agent returns conflicting claims? The synthesizer receives all three structured payloads simultaneously and can note the conflict in the final report, attributing each position to its source. This is actually a strength of the architecture: conflicts become explicit and citable rather than silently averaged away.

Can I add more than three sub-agents? Yes, up to the point where the coordinator’s context fills with distilled outputs. In practice, three to five sub-agents is a reasonable range before you need to think about hierarchical coordination — coordinators coordinating coordinators.

Does the synthesizer ever need tools at all? For pure research aggregation: no. If your synthesis step also needs to format output (e.g., render a chart, call a formatting API), you can add narrow formatting tools. The key constraint is no fetch or search — those must stay with the researchers.

Where This Fits in the Series

This scenario is part of the “How Claude Actually Works” course, which builds from token-level mechanics up to production-grade multi-agent patterns. The multi-agent research system is scenario 3 in the applied section — it puts context management, tool scoping, and structured outputs together into one coherent, verifiable architecture. Browse all tutorials to see where this fits in the full sequence.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →