Context Engineering: What the Model Sees Is What You Design

June 23, 2026 · The Hidden Cost of AI Coding (part 7)

▶ Watch on YouTube & subscribe to The Stack Underflow

If you remember 2022-era “prompt engineering,” you remember when the skill was essentially copywriting for robots — get the phrasing right, nail the tone, add a few examples. That made sense when a conversation was a single turn. It does not make sense when your agent is ten steps deep into a loop and your original prompt is a footnote in a 50,000-token window.

This is the finale of Playlist 3 of How Claude Actually Works, and it ties every previous episode into a single discipline with a name: context engineering. Not a vibe, not a checklist — an engineering discipline with specific failure modes and specific fixes.

The one-sentence version: Context engineering is the practice of deliberately designing everything the model sees when it decides — not just what you typed, but system instructions, tool schemas, retrieved docs, conversation history, tool outputs, and user memory.

Prompt Engineering vs. Context Engineering

The clearest way to see the difference is to put them side by side:

DimensionPrompt EngineeringContext Engineering
FocusHow you communicate (tone, phrasing, examples)What information the model has access to
Primary inputThe prompt itselfSystem instructions, tool schemas, history, retrieved docs, memory
EraSingle-turn chat (2022)Multi-turn agents (now)
AnalogyCrafting a questionCurating a briefing packet

The moment AI moved from chat to agents — from one turn to many — the prompt stopped being the dominant input. By step five of an agent loop, your original prompt is a small fraction of what the model sees. If you only optimize the wording, you optimize a thin slice of what actually drives the output.

You are not writing prompts anymore. You are curating the entire information environment the model operates in. The prompt is just one input among many.

Four Failure Modes (Why Contexts Break)

Context engineering is a named discipline because context fails in specific, repeatable ways. There are four of them:

1. Poisoning — Wrong information enters the context and corrupts subsequent outputs. The model keeps consulting the bad source because it is sitting right there in the window. Garbage in, garbage (confidently) out.

2. Distraction — Too much accumulated content drowns out the real instruction. If you have seen a long agent session start ignoring your system prompt because 40,000 tokens of tool outputs have piled up on top of it, this is what happened. (This was the widget.tsx story from Episode 4.)

3. Confusion — Ambiguous or overlapping content leaves the model unsure which source to follow. Two tool descriptions that do similar things. Two memory entries that say slightly different things. The model picks one arbitrarily, or hedges.

4. Clash — Two pieces of context give directly contradictory instructions. The model has no internal tiebreaker. It cannot know which one wins. Neither can you, reliably.

Each failure mode has a fix. That is what makes this engineering rather than advice.

Four Levers (The Toolkit You Already Have)

Here is the useful surprise: if you have followed this playlist, you already know the fixes. Each episode covered one lever:

Failure Mode  →  Lever       →  Playlist Episode
──────────────────────────────────────────────────
Poisoning     →  Select      →  Ep 6 (model tiering — right model, right task)
Distraction   →  Compress    →  Ep 1 (token budgeting — summarise long history)
Confusion     →  Structure   →  Ep 5 (caching — stable content first, variable last)
Clash         →  Isolate     →  Ep 4 (sub-agents — clean windows per sub-task)

Let’s unpack each:

  • Select — Choose which tokens belong in the window in the first place. Use a cheaper model for small tasks so you do not dump irrelevant context into a flagship’s window. Do not retrieve everything; retrieve what is relevant.
  • Compress — Summarise long conversation history so it costs fewer tokens and keeps the signal-to-noise ratio high. Raw history is expensive and distracting; a summary is cheap and focused.
  • Structure — Put stable content (system instructions, tool schemas) before variable content (user message, retrieved docs) so prompt caching actually hits. Bad ordering kills your cache hit rate.
  • Isolate — Break long tasks into sub-agents, each with its own clean context window, instead of letting one session rot over dozens of steps. Context rot is the long-session equivalent of memory leaks.

Six episodes, four levers, one discipline.

ASCII View: The Context Window as a System

┌─────────────────────────────────────────────────────┐
│  What the model sees at inference time              │
│                                                     │
│  [System prompt / instructions]   ← STABLE (cache) │
│  [Tool schemas]                   ← STABLE (cache) │
│  [Retrieved documents]            ← VARIABLE       │
│  [Conversation history (compressed)] ← COMPRESSED  │
│  [Tool outputs]                   ← VARIABLE       │
│  [User message]                   ← VARIABLE       │
└─────────────────────────────────────────────────────┘

  Context engineering = designing this entire box,
  not just the bottom line.

One Honest Caveat: The Terminology Is Alive

The discipline is still naming itself. Some practitioners are already moving past “context engineering” to call this harness engineering — managing not just the information the model sees, but the rules, feedback loops, and safety rails around it.

The terminology may shift. The mechanics underneath are stable. Both names point at the same truth: the model is one component of a system you design. Optimising the model in isolation is like optimising the database while ignoring the query planner, the schema, and the indexes.

Common Misconceptions

  • “Better prompts fix most agent problems.” Not once the agent is multi-step. By step five, your prompt is a minority of the input. The fix is usually structural — what is in the window, in what order, from what sources.

  • “Context engineering is just RAG.” Retrieval-Augmented Generation is one technique under the Select lever. Context engineering is the full discipline: Select, Compress, Structure, and Isolate together.

  • “The model handles contradictions internally.” It does not. When two instructions clash, the model has no deterministic way to resolve them. You have to prevent the clash at the context-design stage.

  • “This is only relevant for large agent systems.” Even a simple chatbot with a few tools benefits from structured context. The failure modes appear earlier than you expect once history accumulates.

Frequently Asked Questions

What is the practical difference between a “prompt” and “context”? A prompt is the text you explicitly write as a user or developer instruction. Context is everything the model reads before it generates a token — system prompt, tool schemas, retrieved chunks, conversation history, tool call results, and your prompt. Prompt engineering tunes one of those inputs; context engineering designs all of them.

How do I know which failure mode is hitting my agent? Look at the symptom: outputs that were correct and then degraded (poisoning or distraction), outputs that pick the wrong tool or source (confusion), or outputs that oscillate between two behaviors (clash). Compress or isolate for degradation over time; restructure tool schemas for confusion; audit contradictions in your system prompt and retrieved content for clash.

Is context engineering just for Claude, or does it apply to other models? The four failure modes — poisoning, distraction, confusion, clash — are properties of the transformer attention mechanism, not of any one model. The levers apply equally to GPT-4o, Gemini, Llama, or any other context-window-based model. The specific caching mechanics differ per provider, but the discipline is universal.

What comes after context engineering? The video mentions “harness engineering” as the next evolution — managing not just information but the feedback loops, validation layers, and safety rails around the model. Think of context engineering as the information-plane discipline and harness engineering as the control-plane discipline. The next playlist, Agents at Scale, covers the 2026 frontier.

Where This Fits in the Series

This episode is the finale of Playlist 3 of the How Claude Actually Works course on The Stack Underflow. The three playlists converge on a single job: building reliable, affordable, predictable AI systems. Playlist 1 covered the editor (how Claude interacts with code); Playlist 2 covered the agent (how Claude takes actions); Playlist 3 covered the economics (tokens, caching, rot, and now the discipline that ties them together). Context engineering is the name for most of the engineering that sits around the model — and the model, it turns out, is the easy part.

Browse all tutorials to see where this episode sits in the full course.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →