What Is a Token? (LLM Tokens Explained)

A token is the basic unit a large language model reads and generates — usually a sub-word chunk rather than a whole word. Tokens decide how much your API calls cost, how much text fits in a model's context window, and how fast you hit rate limits, so understanding them is the difference between guessing and knowing. This guide explains what a token actually is, how tokens map to words and characters, why the same text produces different counts on different models, and how to count them exactly. Check any text against the token counter tool as you read.

What a Token Is

A token is the basic unit a language model reads and generates — typically a sub-word chunk, not a whole word. Models do not process raw text; they process sequences of tokens, where each token is an entry in a fixed vocabulary that the model was trained on.

Tokens are produced by a byte-pair-encoding (BPE) tokenizer. BPE starts from individual characters and repeatedly merges the most frequent adjacent pairs into single tokens, building up a vocabulary of common character sequences. The result is that frequent words and word-fragments become single tokens, while rare or unusual strings get broken into smaller pieces. A token can be:

  • a whole common word, often including its leading space ( token)
  • a word fragment (token + ization for "tokenization")
  • a single character (common for rare symbols or non-Latin scripts)
  • punctuation or whitespace on its own

The key point: tokens are a unit of machine processing, not human language. The model never sees your words or letters directly — only the token IDs they map to.

Words vs Tokens vs Characters

For typical English prose, one token is about 0.75 words and about 4 characters. So 100 tokens is roughly 75 words, and 1,000 tokens is roughly 750 words or about 4,000 characters. These are approximations, but they are accurate enough for back-of-the-envelope budgeting.

A handy reference table:

Tokens     Words (approx)   Characters (approx)
-------    --------------   -------------------
   100          ~75              ~400
 1,000         ~750            ~4,000
 4,000       ~3,000           ~16,000
10,000       ~7,500           ~40,000

Worked example: the sentence "The quick brown fox jumps over the lazy dog." is 9 words and 44 characters. At roughly 4 characters per token, that estimates to about 11 tokens — and the actual tokenizer count lands close to that, because every word here is common enough to be a single token plus the trailing period.

The ratios shift with content. Plain English prose is the most efficient. Code, JSON, non-English languages, and unusual formatting tokenize less efficiently — they produce more tokens per character because the tokenizer's vocabulary was optimized for natural English text and has to fall back to smaller chunks for everything else.

How BPE Splitting Works

The fastest way to build intuition is to watch a sentence break apart. Common words map to single tokens; long or rare words shatter into several. Here is how a tokenizer might split a sentence (each | marks a token boundary, and · marks a leading space that belongs to the token):

Input:  "Tokenization is unambiguously delightful."

Tokens: "Token" | "ization" | "·is" | "·un" | "amb" | "ig" |
        "uously" | "·delightful" | "."

Count:  9 tokens for 5 words

Notice three things. First, delightful is common enough to be a single token, and it carries its leading space (·delightful). Second, is is one token, also with its leading space. Third, the long, less-common words Tokenization and unambiguously split into multiple fragments — which is exactly why a "5-word" sentence costs 9 tokens.

This also explains why spaces and punctuation are never free. The model has to account for the leading space on most words and for the trailing period as its own token. When you are estimating cost, whitespace and punctuation count.

Why the Same Text Differs Across Models

The same text produces different token counts on different models because each model family uses a different tokenizer with a different vocabulary. A tokenizer trained on a larger or differently-balanced vocabulary will merge different character sequences, so the boundaries — and therefore the count — move.

OpenAI's tokenizer, tiktoken, is open source, and the encoding depends on the model:

  • GPT-4o and GPT-4.1 use o200k_base
  • GPT-4 and GPT-3.5 use cl100k_base

Anthropic does not publish Claude's tokenizer. To get an exact Claude token count you must call Anthropic's count_tokens API; otherwise, estimate at roughly 3.6 to 4 characters per token. Critically, do not use OpenAI's tokenizer to count Claude tokens — the vocabularies are different, so tiktoken will give you wrong numbers for Claude. Always count with the tokenizer that matches the model you are actually calling.

Why Tokens Matter

Tokens are not an academic detail — they are the unit that everything practical is measured in. Three things depend directly on token counts:

1. Pricing

API pricing is per-token, and input and output tokens are priced separately (output is usually more expensive). The text you send and the text the model generates both count. Estimating tokens before you call is how you forecast cost; counting them after is how you reconcile your bill.

2. Context windows

A model's context window — the maximum amount of text it can consider at once — is measured in tokens, not words or characters. Modern models offer windows up to roughly 1 million tokens, but every token of system prompt, conversation history, retrieved documents, and the model's own reply has to fit inside that budget. When a request fails for being too long, it is the token count, not the character count, that blew the limit.

3. Rate limits

API rate limits are frequently expressed as tokens per minute (TPM), sometimes alongside requests per minute. If you are processing large documents in a loop, it is usually the TPM ceiling — not the request count — that throttles you.

The "Strawberry" Quirk

Models have historically struggled with character-level tasks like "how many rs are in strawberry" for one concrete reason: tokenization happens before the model sees any text, so the model never sees individual characters.

By the time strawberry reaches the model, it has already been collapsed into one or two tokens — opaque vocabulary entries, not a string of letters. Asking the model to count the letters inside a token is like asking someone to count the bricks in a building they are only allowed to see from a satellite photo. The information is technically encoded in there somewhere, but it was never presented in a form that makes counting natural.

This is why character-counting, spelling-out, and reversing-a-string tasks were a classic failure mode: they require operating below the token level, on a representation the model does not directly receive. Understanding tokens turns this from a baffling glitch into an expected consequence of how the input is built.

How to Actually Count Tokens

Estimating with the 0.75-words-per-token rule is fine for rough budgeting, but when precision matters — fitting a prompt inside a context window, or forecasting a bill — count against the real tokenizer.

  • In the browser: paste your text into the token counter for an instant count, with no install and nothing leaving your machine.
  • For OpenAI models: use the open-source tiktoken (Python) or js-tiktoken (JavaScript) with the encoding that matches your model (o200k_base or cl100k_base).
  • For Claude: call Anthropic's count_tokens API for an exact figure, or estimate at 3.6 to 4 characters per token when an approximation is acceptable.

The companion guide How to Count Tokens for GPT-4 and Claude walks through the code for each, including the exact library calls and the Claude API request.

Try It Live

The token counter tool counts tokens for your text instantly, right in the browser — paste any prompt, document, or code snippet and see the count without sending it anywhere. For the exact code to count tokens programmatically across GPT-4 and Claude, read How to Count Tokens for GPT-4 and Claude, and use the JSON formatter to tidy the structured data you send to and from LLMs.

Frequently Asked Questions

How many tokens is a word?

For typical English text, one word is roughly 1.3 tokens — or, flipped around, one token is about 0.75 words. So 100 tokens is approximately 75 words, and 1,000 tokens is about 750 words. These are averages: common short words like the or and are usually a single token, while long or rare words split into several. Code, JSON, and non-English text run higher (more tokens per word), so treat the ratio as an estimate, not a guarantee.

What is the difference between a token and a word?

A word is a unit of human language; a token is a unit of machine processing. A tokenizer breaks text into sub-word chunks using byte-pair encoding (BPE), so a token can be a whole word ( token), a word fragment (token + ization), a single character, or even just punctuation or a space. Leading spaces are part of tokens too — token with a leading space is usually one token. The model never sees words or letters directly; it only ever sees tokens.

Why do GPT-4 and Claude give different token counts for the same text?

Because each model family uses a different tokenizer. The same sentence is split into a different set of sub-word chunks depending on the tokenizer's vocabulary, so the counts genuinely differ. OpenAI's tokenizer is open source (o200k_base for GPT-4o and GPT-4.1, cl100k_base for GPT-4 and GPT-3.5). Anthropic does not publish Claude's tokenizer, so exact Claude counts require Anthropic's count_tokens API. Do not use OpenAI's tokenizer to count Claude tokens — it returns wrong numbers.

Why does an LLM struggle to count the letters in a word?

Because the model never sees the letters. Tokenization happens before the model reads anything, so a word like strawberry arrives as one or two tokens, not a sequence of individual characters. Asking how many rs are in strawberry is asking the model to inspect something it was never given — the raw characters are hidden behind the tokens. This is why character-level tasks have historically tripped up otherwise capable models.

How do I count tokens exactly?

For a quick check, paste your text into the token counter — it runs in your browser and shows the count instantly. For OpenAI models, use the open-source tiktoken library (Python) or js-tiktoken (JavaScript) with the matching encoding. For Claude, call Anthropic's count_tokens API for an exact number, or estimate with roughly 3.6 to 4 characters per token. Never rely on a plain word count multiplied by a fixed ratio when precision matters — always count against the actual tokenizer.