LLM Token Counter
Count tokens for GPT, Claude, Grok, and Gemini prompts.
Every tool you expose through an MCP server is sent to the model as part of the context on <em>every</em> request — its name, description, and full input schema — before the user has typed a word. With a handful of servers connected, those definitions can quietly consume a large share of the context window and degrade the model's ability to pick the right tool. Paste your tool definitions here to see exactly how many tokens each one costs and what percentage of the window they take. It runs entirely in your browser.
The Model Context Protocol made it easy to give an AI agent dozens of tools by connecting a few servers. The hidden cost is that every one of those tool definitions — name, description, and full JSON input schema — is loaded into the model's context window and sent on every single request, before the user asks anything. A few connected servers can consume a large fraction of even a generous context window purely on tool definitions, leaving less room for the actual conversation and documents, and inflating the cost of every call.
Token cost is only half the problem. As the number of available tools grows, the model gets worse at choosing the right one — selection accuracy degrades noticeably with larger tool sets, which is why a focused set of well-described tools usually outperforms a sprawling one. Measuring the token weight of your definitions is the first step to controlling both: it tells you which tools dominate your budget and which descriptions are doing more harm than good.
The ecosystem responded in late 2025 with structural mitigations — loading tool schemas on demand instead of all upfront, and letting the model invoke tools through generated code rather than one definition-heavy call at a time — both of which sharply cut the upfront token cost. But the cheapest win is still editorial: trim the heaviest descriptions and remove the tools you do not need. This counter shows you where that weight is.
name, description, and inputSchema in the model's context so the model knows what it can call. That context is sent on every request, so the tool definitions are a fixed cost paid before any conversation happens. A single richly-described tool can run from a couple hundred to well over a thousand tokens, and the cost scales with every tool and every connected server.o200k_base tiktoken encoding via the open-source gpt-tokenizer library, which gives exact counts for GPT-4o, GPT-4.1, GPT-5 and the o-series. Claude, Gemini, and other models use different tokenizers, so for those the numbers are a close estimate — typically within about 10–20% — which is more than accurate enough for budgeting how much of your context window the tools consume.