How LLM Tokens Work — And Why They Explain Your AI Bill

June 23, 2026 · How Claude Actually Works (part 3)

▶ Watch on YouTube & subscribe to The Stack Underflow

Here’s the thing almost nobody internalizes about large language models: Claude never reads your words. It reads tokens — numbers. Your prompt is chopped into pieces, each piece is mapped to an integer, and the model only ever sees those integers. Every limit you hit, every bill you pay, and half the weird behavior you’ve seen traces back to this one fact.

This tutorial explains what a token actually is, why the model works in tokens instead of words, and how that single design choice explains your AI bill.

The one-sentence version: text is split into tokens (chunks roughly ¾ of a word on average), each token maps to a number, and you pay per token — in and out — so understanding tokens is understanding cost.

What a token actually is

A token is a chunk of text — often a word, but frequently a piece of a word, a space, or a punctuation mark. The tokenizer is a fixed dictionary that maps text chunks to integer IDs.

Rough intuition:

Common words (the, code, error) are usually one token.
Rare or long words split into several tokens (tokenization → token + ization).
Whitespace and punctuation are tokens too.
A useful rule of thumb in English: ~4 characters per token, or ~0.75 words per token.

So “How tokens work” isn’t 3 words to the model — it’s a sequence of integer IDs like [4438, 11460, 990]. The model does math on those numbers. The English you typed was never seen.

Why models use tokens instead of words or letters

Two extremes, both bad:

Whole words: the vocabulary would be enormous and would break on any word it had never seen.
Single characters: sequences would be absurdly long and the model would waste capacity relearning how letters form words.

Tokens are the engineered middle: a fixed vocabulary (tens of thousands of entries) of common chunks that can assemble any text — including words the model has never encountered — by gluing pieces together. It’s the compression that makes the whole thing tractable.

Why this explains your bill

Every API provider, including Anthropic, prices per token — and counts both directions:

Input tokens: everything you send — your prompt, the system prompt, the conversation history, retrieved documents, tool definitions. All of it.
Output tokens: everything the model generates back. Output is typically priced several times higher than input.

This is why costs surprise people:

Long context isn’t free. If you stuff 50 pages into the prompt “just in case,” you pay for all of it on every call.
Conversation history compounds. In a chat, each new turn resends the whole prior conversation as input. Turn 20 is paying for turns 1–19 again.
Verbose output costs more than verbose input. Asking for a 2,000-word answer is pricier than sending a 2,000-word prompt.

Your bill ≈ (input tokens × input price) + (output tokens × output price)
            └── prompt + history + docs + tools      └── the model's reply

A worked intuition

Say input is priced at $3 per million tokens and output at $15 per million (illustrative — check current rates). You send a 1,000-token prompt and get a 500-token answer:

Input: 1,000 × ($3 / 1,000,000) = $0.003
Output: 500 × ($15 / 1,000,000) = $0.0075
One call ≈ $0.01

Tiny — until you multiply by thousands of calls, or let conversation history balloon each call’s input to 20,000 tokens. That’s where bills come from: not one expensive call, but token count × call count.

How to reason about token cost

Count tokens, not words, when estimating cost.
Trim the prompt to what’s needed. Every “just in case” paragraph is paid for on every call.
Watch history growth in chat apps — prune or summarize old turns.
Constrain output length when you don’t need an essay.
Cache the stable prefix if you reuse the same big context repeatedly (covered later in this series — see prompt caching).

Common misconceptions

“The model reads my text.” No — it reads token IDs. Your words are converted first.
“One word = one token.” Often, but long/rare words split into multiple tokens, and spaces/punctuation count.
“Only output costs money.” Both directions are billed; input is usually cheaper per token but there’s a lot more of it.
“A bigger context window is free to use.” The capacity is available; using it costs tokens on every call.

Frequently asked questions

How many tokens is a typical page of text? Roughly 500–800 tokens per page of prose, but it varies with formatting and vocabulary.

Why do code and JSON sometimes cost more tokens than they look? Symbols, indentation, and braces each tokenize separately, so structured text can be token-dense relative to its character count.

Does the system prompt count? Yes. The system prompt, tool definitions, and any retrieved context are all input tokens you pay for on every call.

Is there a way to avoid resending the same big context every time? Yes — prompt caching lets you reuse a stable prefix at a fraction of the cost. That’s a later tutorial in this series.

Where this fits in the series

This is the foundation of How Claude Actually Works. Once you see that everything is tokens, the rest of the course — context windows, tools, caching, cost control — clicks into place. Use the navigation below to continue, or browse all tutorials.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →