How to Count Tokens for GPT-4 and Claude

Every LLM API bills per token, and every model has a context window measured in tokens — so before you send a prompt you usually want to know how many tokens it is. The catch: OpenAI ships an exact, open-source tokenizer you can run locally, but Claude has no public client-side tokenizer, so an exact Claude count requires a network call to Anthropic's API. This guide shows the correct way to count for both, the right approximation when you can't call the API, and how to turn token counts into dollars. Check any text against the token counter tool, which runs OpenAI's tokenizer in your browser.

What a Token Is

A token is a sub-word unit produced by a Byte Pair Encoding (BPE) tokenizer. The model never sees raw characters or whole words — it sees a sequence of token IDs. Common words map to one token, rarer words split into several, and whitespace and punctuation get their own tokens too.

Two rules of thumb for English prose:

  • ~0.75 words per token — so 1,000 tokens is roughly 750 words.
  • ~4 characters per token — so a 4,000-character message is roughly 1,000 tokens.

These are averages for ordinary English. Code, JSON, and non-English text tokenize less efficiently — more tokens per character — because they contain more rare sequences, indentation, and punctuation. Use the rules of thumb for a back-of-the-envelope feel; use the exact tokenizer for anything that touches billing or a context-window limit.

Why Count Tokens at All

Three concrete reasons:

  • Billing. Every major LLM API charges per token — input and output priced separately. Counting tokens is how you estimate and control cost.
  • Context windows. Each model has a maximum context measured in tokens. If your prompt plus the expected response exceeds it, the request fails or gets truncated. Counting tells you whether you fit.
  • Prompt sizing. When you assemble a prompt from a system message, retrieved documents, chat history, and a user question, token counts tell you how much of each you can afford before you hit the ceiling.

OpenAI: Exact Counts in Python (tiktoken)

OpenAI publishes tiktoken, the same tokenizer its models use — so counts are exact, and you can run it entirely offline. The one thing to get right is the encoding, which differs by model:

  • o200k_base — GPT-4o, GPT-4o mini, GPT-4.1
  • cl100k_base — GPT-4, GPT-4 Turbo, GPT-3.5

You can let tiktoken resolve the encoding from a model name, or request the encoding directly:

# pip install tiktoken
import tiktoken

text = "Count these tokens, please."

# Resolve the encoding from the model name (recommended)
enc = tiktoken.encoding_for_model("gpt-4o")   # -> o200k_base
print(len(enc.encode(text)))

# Or request an encoding directly
enc = tiktoken.get_encoding("o200k_base")
print(len(enc.encode(text)))

# For GPT-4 / GPT-4 Turbo / GPT-3.5
enc = tiktoken.get_encoding("cl100k_base")
print(len(enc.encode(text)))

This counts the tokens in a raw string. A real chat request also adds a few tokens per message for the role and formatting overhead — see the pitfalls section below.

OpenAI: Exact Counts in JavaScript (gpt-tokenizer)

Browsers and Node have no built-in tokenizer, so use a pure-JS port. gpt-tokenizer is dependency-free and browser-friendly:

// npm install gpt-tokenizer
import { encode } from 'gpt-tokenizer';

const text = 'Count these tokens, please.';
console.log(encode(text).length);

// The default import targets the modern (o200k_base / cl100k) encodings.
// For a specific model, import the matching submodule, e.g.:
//   import { encode } from 'gpt-tokenizer/model/gpt-4o';     // o200k_base
//   import { encode } from 'gpt-tokenizer/model/gpt-4';      // cl100k_base

js-tiktoken is another solid option with the same exact-count guarantee. The Janeer token counter runs gpt-tokenizer client-side, so you can paste text and see the OpenAI token count instantly without anything leaving your browser.

Claude: Counting with the count_tokens API

This is the key correctness point: Anthropic does not ship a public client-side tokenizer. You cannot compute an exact Claude token count offline, and you must not reuse OpenAI's tokenizer for it — tiktoken and gpt-tokenizer produce wrong counts for Claude because Claude's models use a different tokenizer.

The only exact source is Anthropic's count_tokens endpoint. It takes a real request shape (model + messages), returns the input token count, and does not generate a response — so it is cheap and accurate. It needs an API key and a network round-trip.

Python, with the official anthropic SDK:

# pip install anthropic
from anthropic import Anthropic

client = Anthropic()  # reads ANTHROPIC_API_KEY from the environment
text = "Count these tokens, please."

result = client.messages.count_tokens(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": text}],
)
print(result.input_tokens)

TypeScript, with the official SDK:

// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic(); // reads ANTHROPIC_API_KEY

const result = await client.messages.countTokens({
  model: 'claude-opus-4-8',
  messages: [{ role: 'user', content: 'Count these tokens, please.' }],
});

console.log(result.input_tokens);

Use the current Claude model IDs exactly, with no date suffix: claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5. Because count_tokens takes the same messages array you would send to generate text, it already accounts for the per-message role and formatting overhead — the number it returns is the input tokens you will actually be billed for on that request.

Estimating Claude Tokens Without the API

When you genuinely can't make a network call — a browser-only tool, an offline script, a quick sanity check — fall back to a character-based estimate. A reasonable approximation for Claude is about 3.6 to 4 characters per token:

// Rough, offline ESTIMATE only — not an exact count
function estimateClaudeTokens(text) {
  return Math.ceil(text.length / 3.8);
}

console.log(estimateClaudeTokens('Count these tokens, please.'));

Always label this as an estimate, and pad your headroom against context-window limits rather than trusting it to the last token. For anything that affects billing accuracy or a hard limit, call count_tokens — the estimate exists only for the cases where you can't.

Turning Tokens Into Cost

Pricing is quoted per million tokens, and input and output are priced separately. The formula:

cost = (tokens / 1,000,000) × price_per_1M

total = (input_tokens  / 1_000_000) × input_price
      + (output_tokens / 1_000_000) × output_price

You know the input token count before you send the request. You only know the output token count after the model responds, so output cost is an estimate until then — budget for a plausible response length.

Approximate per-1M-token rates as of June 2026 (verify on the official OpenAI and Anthropic pricing pages, since rates change):

Model              Input ($/1M)   Output ($/1M)
GPT-4o                 2.50           10.00
GPT-4o mini            0.15            0.60
Claude Opus 4.8        5.00           25.00
Claude Sonnet 4.6      3.00           15.00
Claude Haiku 4.5       1.00            5.00

Worked example: a 2,000-token prompt to Claude Sonnet 4.6 that returns a 500-token answer costs (2000 / 1,000,000) × 3.00 + (500 / 1,000,000) × 15.00 = $0.006 + $0.0075 = $0.0135 — about 1.35 cents per call.

Common Pitfalls

Using tiktoken or gpt-tokenizer for Claude

The single most common mistake. OpenAI's tokenizer is not Claude's tokenizer — running Claude text through tiktoken gives a number that is simply wrong, sometimes off by 10–20% or more. For Claude, use count_tokens for exact counts, or the character-based estimate when you can't.

Forgetting output tokens

Cost and context-window math need both directions. The input count you computed up front is only half the bill, and the response also consumes context-window budget. Always reserve room for the output you expect.

Counting raw text instead of the message envelope

A chat request is not just your string — each message carries a small per-message overhead for its role and formatting. Counting the raw string undercounts a real request by a few tokens per message. Anthropic's count_tokens handles this for you because you pass the full messages array; for OpenAI you add the documented per-message overhead on top of the string count.

Using the wrong OpenAI encoding

GPT-4o-family models use o200k_base; GPT-4 / GPT-4 Turbo / GPT-3.5 use cl100k_base. Mixing them up produces counts that are close but not exact. Resolve the encoding from the model name with encoding_for_model to avoid the trap.

Assuming code and non-English count like English

The ~4-characters-per-token rule is for English prose. Source code, minified JSON, and non-Latin scripts pack more tokens per character, so a 1,000-character JSON blob can be far more than 250 tokens. When the content isn't ordinary English, count it — don't extrapolate from the rule of thumb.

Try It Live

The token counter tool runs OpenAI's tokenizer in your browser — paste any text and see the GPT-4-family token count instantly, with nothing sent to a server. For exact Claude counts you'll still want Anthropic's count_tokens API, but the tool is the fastest way to sanity-check prompt length and the rules of thumb. Pair it with the JSON formatter when you're sizing structured prompts or tool-call payloads.

Frequently Asked Questions

How many tokens is a word, on average?

For typical English prose the rule of thumb is roughly 0.75 words per token, or about 4 characters per token — so 1,000 tokens is around 750 words. These are averages, not guarantees: short common words are often a single token, while long or rare words split into several. Code, JSON, and non-English text tokenize less efficiently (more tokens per character) because they contain more rare sequences, whitespace, and punctuation. For billing or context-window decisions never rely on the rule of thumb — count exactly with the model's tokenizer.

How do I count tokens for GPT-4 exactly?

Use OpenAI's open-source tiktoken library, which is the same tokenizer the models use. In Python: pip install tiktoken, then enc = tiktoken.encoding_for_model("gpt-4o") and len(enc.encode(text)). Pick the right encoding for your model — GPT-4o, GPT-4o mini, and GPT-4.1 use o200k_base, while GPT-4, GPT-4 Turbo, and GPT-3.5 use cl100k_base. In the browser or Node there is no built-in tokenizer, so use a pure-JS port like gpt-tokenizer or js-tiktoken.

Can I count Claude tokens locally without an API call?

Not exactly. Anthropic does not publish a client-side tokenizer, so there is no offline library that produces an exact Claude token count. Do not use tiktoken or gpt-tokenizer for Claude — those are OpenAI's tokenizer and give wrong counts for Claude's models. The only exact source is Anthropic's count_tokens API endpoint, which needs an API key and a network round-trip. If you only need a rough number offline, estimate at about 3.6 to 4 characters per token and clearly label the result as an estimate.

How do I count Claude tokens with the API?

Call Anthropic's count_tokens endpoint through the official SDK. In Python: client.messages.count_tokens(model="claude-opus-4-8", messages=[{"role":"user","content":text}]), then read .input_tokens. In TypeScript: await client.messages.countTokens({ model: "claude-opus-4-8", messages: [...] }) and read .input_tokens. It returns the input token count for that exact request shape — including message and role overhead — without actually generating a response, so it is cheap and accurate for sizing prompts.

How do I turn a token count into a dollar cost?

The formula is cost = (tokens / 1,000,000) × price_per_1M. Input and output tokens are priced separately, so total cost is input tokens times the input rate plus output tokens times the output rate. You know the input token count before you send the request, but you only learn the output token count after the model responds. As of June 2026, GPT-4o is roughly $2.50 per 1M input / $10 output and Claude Opus 4.8 is about $5 / $25 — always verify current numbers on the official OpenAI and Anthropic pricing pages, since rates change.