Tutorials
Follow a series in order, or jump straight to the one you searched for.
Agents at Scale: The 2026 Frontier
-
1. Why AI Agent Projects Fail Between Pilot and Production
86–89% of enterprise AI agent pilots never reach production. Learn the four structural failure modes that kill agent projects and how to avoid them.
-
2. The 6 Multi-Agent Patterns That Actually Work in 2026
Orchestrator-worker, routing, pipeline, parallel fan-out, reflection, and debate — the six multi-agent patterns worth using, when each one pays off, and when a single agent still wins.
-
3. A2A vs MCP: How Agent-to-Agent and Agent-to-Tool Protocols Differ
Learn how A2A and MCP protocols work together in multi-agent AI systems — one connects agents to tools, the other connects agents to each other.
-
4. How Subagent Isolation Prevents Context Rot in LLM Agents
Learn how spawning fresh-context subagents structurally eliminates the accuracy decay that plagues long single-agent sessions.
-
5. Agent Observability: How to Trace and Debug AI Agents with OpenTelemetry
Learn why status codes lie in agent systems and how to use OpenTelemetry GenAI semantic conventions to trace handoffs, tool I/O, and state mutations.
-
6. What Is Harness Engineering? How the System Around the Model Determines AI Reliability
Learn why the agent harness — not the model — is the real differentiator in 2026 AI systems, and what the five harness layers actually are.
How Claude Actually Works
-
1. How Claude Works: A 5-Layer Mental Model for Developers
Learn the 5-layer Claude Stack — model, Messages API, MCP tools, agent loop, and surfaces — and how they compose into real AI-powered apps.
-
2. Claude's 5-Layer Stack: MCP, Hooks, Skills, and Subagents Explained
A structured map of every Claude feature across 5 layers and 2 cross-cutting planes, so you can place any new Anthropic release instantly.
-
3. How LLM Tokens Work — And Why They Explain Your AI Bill
Claude never sees your words — it sees tokens. Here's what tokenization actually is, why it drives every dollar of your AI bill, and how to reason about token cost.
-
4. How Claude's Context Window Works: Limits, Costs, and Overflow
Learn what fills Claude's context window, which models support 1M tokens, when pricing changes, and what happens when you overflow it.
-
5. Understanding stop_reason in the Claude Messages API
Learn how stop_reason controls agent loop branching in the Claude API — end_turn, tool_use, max_tokens, pause_turn, and more explained.
-
6. How Claude Tool Calling Actually Works: The Request-Execute Model
Learn how Claude's tool calling works under the hood: the model requests, your code executes. Covers tool use blocks, the three executor lanes, and schema tips.
-
7. What Is MCP (Model Context Protocol) and How It Works
Learn how MCP reduces AI tool integrations from N×M to N+M by giving every client and tool a single shared protocol to speak.
-
8. How Claude Code's Agent Loop Works (and Why It Breaks)
Learn how Claude's agent loop works, what causes infinite loops and premature exits, and the four controls that keep agents on track.
-
9. Claude Code Hooks Explained: Deterministic Guards for the Agent Loop
Learn how Claude Code hooks enforce hard rules at every lifecycle event — pre-tool, post-tool, session start/stop — that the model cannot override.
-
10. How to Guarantee JSON Output from Claude with Structured Outputs
Learn how to use tool choice, Pydantic schemas, and Claude's native structured output to get reliable JSON from LLMs every time — no parser explosions.
-
11. How to Pin Model Output Format Using Few-Shot Examples
Learn why few-shot examples in the messages array beat temperature tweaks for fixing format, locale, and edge-case failures in LLM output.
-
12. Temperature, Top-P, and Top-K Explained: Controlling LLM Randomness
Learn how temperature, top-p, and top-k sampling parameters shape an LLM's output distribution and when to tune each for your use case.
-
13. How to Write Acceptance Criteria for LLM Output (Not Just 'Be Accurate')
Stop writing vague prompts. Learn how to define testable acceptance criteria for LLM output — covering format, edge cases, missing data, and ambiguity.
-
14. Confidence Fields and Human-in-the-Loop Routing for LLM Extraction Pipelines
Learn how to add a confidence field to LLM tool schemas and route low-confidence extractions to a human review queue automatically.
-
15. How Claude Code Edits Your Repo: Inside the Agentic Edit Loop
Claude Code doesn't regenerate whole files — it reads, locates, and makes surgical string edits in a verify loop. Here's exactly how an agentic code editor changes your repo, step by step.
-
16. How CLAUDE.md File Hierarchy Works: User, Project, Subtree, Local
Learn how Claude's four-tier CLAUDE.md hierarchy (user, project, subtree, local) controls which rules apply, and how precedence is resolved.
-
17. Claude Code Extensions: Skills, Subagents, Hooks, and Plugins
Learn the four extension points in Claude Code — skills, subagents, hooks, and plugins — and when to reach for each one.
-
18. Anthropic Agent SDK: Use Claude Code's Engine in Your App
Learn how the Anthropic Agent SDK exposes Claude Code's agent loop, built-in tools, and MCP support as a library you call from Python or TypeScript.
-
19. Anthropic Managed Agents: Claude Runs the Loop for You
Learn how Anthropic's managed agents API offloads the agent loop, tool sandbox, and persistent state to Anthropic's cloud, replacing hand-rolled loops.
-
20. How to Run Claude Code Headlessly in CI/CD Pipelines
Learn how to use claude -p to run Claude Code headlessly in CI/CD: reviewing PRs, generating changelogs, and scripting AI tasks without an interactive shell.
-
21. How Prompt Caching Cuts Your AI Bill ~90% (and the Floor Trap)
Prompt caching reuses a stable prompt prefix so you stop paying full price to resend the same context. Here's how it works, the real numbers, and the floor trap that quietly costs you money.
-
22. Context Engineering: Pin, Summarize, Prune, and Compact
Learn four techniques to keep long Claude sessions coherent and affordable: pin stable facts, summarize resolved turns, prune tool output, and compact under pressure.
-
23. How to Write LLM Evals: Testing AI Apps with Real Data
Learn how to replace gut-feel LLM testing with a real eval harness: datasets, graders, CI score gates, and LLM-as-judge caveats.
-
24. Prompt Injection Attacks Explained: How to Defend Your AI Agent
Learn how prompt injection turns untrusted text into commands, why indirect injection is the dangerous case, and three layered defenses every agent needs.
-
25. Agent Escalation: When to Hand Off to a Human vs. Keep Handling
Learn the four exact signals that should trigger agent-to-human escalation, why sentiment is never one of them, and how to structure a clean handoff summary card.
-
26. How Claude Token Billing Works: Input, Output, and Cache Costs
Learn how Claude breaks API costs into input, output, and cached tokens, why output tokens cost 5x more, and how to fix the three most common cost leaks in agents.
-
27. How to Structure a Production Claude Agent: All Layers Explained
Learn the full layered architecture of a production Claude agent: gateway, model router, agent loop, scoped tools, policy hooks, prompt caching, and clean escalation.
-
28. Building a Customer Support Agent with Claude: Tools, Policy Hooks, and Escalation
Learn how to build a production Claude agent with scoped tools, code-enforced policy caps, structured errors, clean escalation, and a logging plane.
-
29. Building a Multi-Agent Research System with Isolated Contexts
Learn how to build a multi-agent research pipeline with a coordinator, scoped sub-agents, and provenance-preserving synthesis — without context overflow.
-
30. How to Build a Structured Data Extraction Pipeline with Claude
Learn how to wire forced schemas, few-shot examples, validation retry, confidence routing, and prompt caching into one production extraction flow.
-
31. Claude Architecture Explained: 5 Layers and 2 Cross-Cutting Planes
A complete recap of the Claude stack: model, protocol, reach, orchestration, and surfaces — plus the two planes that cut through every layer.
-
32. CCA-F Exam Study Guide: Claude Certified Architect Foundations
Map every CCA-F exam domain to the Claude stack layers, understand score weights, and build a focused study path using official episode blocks.
The Hidden Cost of AI Coding
-
1. What Is a Token in AI? How AI Coding Tools Are Priced
Learn what tokens are, why input and output tokens cost different amounts, and how token pricing affects your AI coding bills.
-
2. Why AI Agent API Costs Are So Much Higher Than Chatbots
62% of an AI agent's API bill is the model rereading prior context on every step. Learn exactly why agentic workloads cost 5–30x more than chatbots.
-
3. Context Window Limits: Why 200K Tokens Isn't Really 200K
Learn why advertised context window sizes are misleading, what the real working limit is, and how token position affects model recall.
-
4. What Is Context Rot and Why AI Agents Degrade Over Time
Learn why AI agents get worse during long sessions even when the context window isn't full — and the three compounding mechanisms behind context rot.
-
5. Prompt Caching: How Anthropic and OpenAI Differ (and the Catch)
Learn how prompt caching works on Anthropic and OpenAI, what the break-even math looks like, and why prefix order can make or break your hit rate.
-
6. How to Reduce AI Coding Costs 40-60% with Model Tiering
Learn how routing agent tasks to the right model tier (Haiku vs Sonnet vs Opus) cuts AI coding spend 40-60% with no measurable quality loss.
-
7. Context Engineering: What the Model Sees Is What You Design
Learn how context engineering replaced prompt engineering: four failure modes, four levers, and why curating the model's information environment is the real job.
AI Agent Internals: How Coding Agents Really Work
-
1. How AI Agents Use Tools: The Model vs. Orchestrator Split
Learn how AI agents actually execute tools — the model only writes text; a separate orchestrator reads, validates, and runs the real actions.
-
2. MCP Explained: How AI Agents Connect to Any Tool
Learn how the Model Context Protocol (MCP) uses hosts, clients, and servers to let any AI editor connect to any tool without custom integration code.
-
3. The Agent Loop and Supervision Contracts in AI Coding Tools
Learn how the agent loop works in AI coding agents and why the real difference between Cursor, Claude Code, and Devin is where the human sits inside it.
-
4. How a Chat Message Becomes a GitHub API Call: Full Stack Trace
Trace exactly how a typed English sentence travels through 8 nodes — editor, model, MCP client/server, GitHub API — and comes back as a created issue.
-
5. How MCP Apps Work: Tools That Return Interactive UI
MCP tools can return real interactive UI — not just text. Here's how MCP Apps render forms, dashboards, and widgets inside Claude and ChatGPT, with a minimal working example.
-
6. Why MCP Tools Disappear: Editor Modes and Permission Gates
Learn why MCP tools vanish in some editor modes — Ask, Edit, and Agent modes control which capabilities fire, and that gate is a design choice.
The Engine Behind Every AI Code Editor
-
1. Why Cursor, Windsurf, and Copilot Are All Built on VS Code
Learn why every major AI code editor — Cursor, Windsurf, GitHub Copilot — runs on VS Code, and how VS Code's multi-process architecture makes that possible.
-
2. How VS Code Isolates Extensions to Prevent Editor Crashes
Learn how VS Code's extension host process keeps your editor alive when an extension hangs or crashes, and why this design enables forks like Cursor.
-
3. How Autocomplete and Go-to-Definition Actually Work in VS Code
Learn how the Language Server Protocol (LSP) powers autocomplete and go-to-definition — and why your editor never changed when AI arrived.
-
4. How AI Agents Run Terminal Commands in VS Code
Learn how AI agents like Claude interact with VS Code's terminal via a pseudo-terminal and shell integration markers to run and monitor commands.
-
5. How VS Code Remote Development Works: SSH, Containers, and Tunnels
Learn how VS Code splits its processes across local and remote machines using the VS Code Server and network tunnels — same architecture, longer wires.