How to Cut MCP Tool Token Usage
Every MCP server you connect injects its tool definitions — name, description, and input schema — into the model's context on every single request, before you type anything. Connect a few popular servers and that fixed cost can swallow most of the context window, while tool-selection accuracy drops as the list grows. This guide covers how to measure the per-tool cost and cut it, in order of leverage: editorial trims first, then structural features like tool search and code execution, then the architectural fix of not wrapping every API endpoint. Measure the heaviest definitions with the MCP tool token counter before you start trimming.
TL;DR
MCP (Model Context Protocol) tool definitions are sent to the model on every request, before you type a word — so a bloated tool set is a fixed tax on your context window and a drag on accuracy. Cut it in this order of leverage:
- Measure first. Count the per-tool token cost to find the heaviest definitions. You can't cut what you can't see — start with the MCP tool token counter.
- Editorial (biggest immediate win). Trim each description to "what it does + when to call it," remove redundant tools, and prune verbose input schemas.
- Structural. Use tool search / load-on-demand (load only relevant tools) and code execution with MCP (call tools by writing code) — both added late 2025, both reported to cut large fractions of token usage.
- Architectural. Don't wrap every API endpoint 1:1 as a tool — that's what causes the explosion. Curate to the few intent-shaped tools the agent actually needs.
The MCP ecosystem moves fast — the current spec revision is 2025-11-25 (a 2026-07-28 release candidate exists but is non-final). For spec-version or feature specifics, confirm against the current MCP spec and your client/vendor docs.
Why MCP Tool Definitions Cost Tokens
When an MCP client connects to a server, it calls tools/list to fetch that server's tools. For each tool, the client puts three things into the model's context:
name— the tool identifier.description— the natural-language explanation the model reads to decide whether and how to call it.inputSchema— a JSON Schema (draft 2020-12) describing the arguments: properties, types, enums, required fields, nested objects.
The crucial detail: all of this is sent on every request, before the user types anything. It's a fixed cost paid on each turn of the conversation, not a one-time setup. Ten verbose tools across a couple of servers can occupy a large slice of the window before the actual work begins — and every token spent on definitions is a token not available for the conversation, retrieved context, or the model's output.
This is true regardless of transport. MCP currently defines two transports — stdio (local) and Streamable HTTP (remote); the older HTTP+SSE transport is deprecated. The token cost lives in the definitions themselves, not the wire protocol, so the fixes below apply to local and remote servers alike.
How Bad It Gets
This is the dominant MCP topic of 2026 for a reason. Some reported numbers:
- Context consumption. Connecting several popular servers (for example GitHub + Slack + Sentry) has been measured consuming on the order of ~70% of a 200K-token context window in tool definitions alone, before the first query. Treat this as a reported measurement — it depends heavily on which servers and how many tools — but the order of magnitude is the point.
- Per-tool cost. A single tool commonly runs from a couple hundred tokens to well over a thousand, driven by description length and schema depth.
- Accuracy degradation. Tool-selection accuracy drops sharply as the list grows, because the model has more lookalike options to disambiguate. One widely-cited 2026 result reported accuracy falling from ~43% to under ~14% with a bloated tool set. Present that as reported research, not gospel — but the direction is consistent.
- Practical ceilings. Real clients start to struggle past roughly a few dozen tools. Cursor's practical ceiling is around 40.
So bloat costs you twice: it burns context and it makes the agent pick the wrong tool more often. That's why trimming pays off on two axes at once.
Measure First
You can't cut what you can't see. Before trimming anything, count the per-tool token cost so you know which definitions are actually heavy — the worst offender is usually one or two tools with a long description and a deep schema, not the whole set evenly.
Paste your tool definitions (the name, description, and inputSchema from tools/list) into the MCP tool token counter to see the token cost per tool and in total. Sort by cost, then trim the heaviest first — that's where the leverage is. After each round of trimming, re-count to confirm the saving landed where you expected.
Counting also tells you whether you've crossed the thresholds that trigger client behavior — for example, Claude Code switching to tool search once tools exceed roughly 10% of context.
Editorial Fixes (Cheapest, Biggest Immediate Win)
Most tool definitions are far longer than they need to be. The model needs to know what the tool does and when to call it — not a marketing pitch, not three restatements, not a paragraph-long example. Tighten each description to that, and good descriptions also improve selection accuracy, so this is a double win.
Here's a typical bloated description, before and after.
Before (~110 words of description — restates itself, includes marketing language and a long example):
{
"name": "search_issues",
"description": "This powerful and flexible tool allows you to search for issues in the issue tracker. It is the best way to find issues. You can use it to search for issues by keyword, by author, by label, by status, by assignee, and by many other criteria. The search is full-text and very fast. For example, if you wanted to find all open issues assigned to the user 'alice' that mention the word 'login', you could call this tool with a query like 'is:open assignee:alice login' and it would return a list of matching issues with their titles, numbers, statuses, and URLs. This is extremely useful for triaging and for reporting.",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "The search query string to use when searching for issues in the tracker." },
"sort": { "type": "string", "enum": ["created", "updated", "comments", "relevance", "priority", "reactions", "interactions", "author-date", "committer-date"] },
"order": { "type": "string", "enum": ["asc", "desc"] },
"per_page": { "type": "integer" },
"page": { "type": "integer" }
},
"required": ["query"]
}
}
After (one tight sentence; trimmed the enum to the values you actually use; dropped pagination params the agent never needs):
{
"name": "search_issues",
"description": "Search the issue tracker with a query string (supports is:, assignee:, label: filters). Use to find or triage issues by keyword or attribute.",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Search query, e.g. 'is:open assignee:alice login'." },
"sort": { "type": "string", "enum": ["created", "updated", "relevance"] }
},
"required": ["query"]
}
}
That single edit cuts the definition by well over half. The editorial checklist:
- Trim the description to "what it does + when to call it." Drop marketing prose ("powerful and flexible"), redundant restatements, and long inline examples — one short example argument is enough.
- Remove redundant or overlapping tools. If two tools do nearly the same thing, the model wastes tokens reading both and picks wrong more often. Keep one.
- Prune verbose input schemas. Long enums (keep only the values you actually use), deep nesting, repeated boilerplate, and unnecessary optional parameters all add up. Every property description is also context.
Tighter schemas help here too — if you're tuning the JSON Schemas that drive tool calls, see OpenAI vs Anthropic Structured Outputs for how the two providers constrain and validate them.
Structural Fixes (Don't Send Everything Upfront)
Editorial trimming makes each definition smaller. The structural approaches, added in late 2025, change when definitions enter context at all. Both report large reductions — cite the figures below as Anthropic-reported, since your real savings depend on your tool set and workload, and availability varies by client.
Tool search / load-on-demand
Instead of putting all tool definitions in context upfront, the client searches for the tools relevant to the current task and loads only those. The bulk of definitions stays out of context until needed. Anthropic's Tool Search Tool reportedly cut a case from ~77k to ~8.7k tokens, and it's now the default in Claude Code when tools exceed roughly 10% of context.
Code execution with MCP
Rather than emitting one definition-heavy tool call at a time, the model writes code in a sandbox that calls the tools. The definitions don't all need to sit in the prompt, and the model composes multiple calls in a single program. Anthropic reported a case dropping from ~150k to ~2k tokens (~98.7%) with this approach.
Both features depend on client and (sometimes) server support, and the MCP spec is evolving quickly — confirm what your client supports against the current MCP spec and the vendor's docs before relying on a specific number.
Architectural Fix (Don't Wrap Every Endpoint)
The deepest fix is to not create the bloat in the first place. The most common cause of tool explosion is wrapping an existing REST API 1:1 — one MCP tool per endpoint. A 200-endpoint API becomes 200 tools, most of which the agent will never sensibly call, and all of which cost context on every request.
Roughly 58% of MCP builders are wrapping existing APIs, and the expert consensus is not to wrap 1:1. Instead, curate to the few high-value, intent-shaped tools the agent actually needs. A good MCP tool maps to a task the agent wants to accomplish ("create a triaged bug from this stack trace") rather than a low-level endpoint ("POST /issues"). One well-designed tool can wrap several API calls behind a single intent — which means fewer definitions, less context, and clearer choices for the model.
If you own the server, design for intent. If you're consuming someone else's overly granular server, the editorial and structural fixes above are your levers until they redesign it.
Checklist
- Measure. Count per-tool token cost; sort by cost; trim the heaviest first. Re-count after each change.
- Trim descriptions to "what it does + when to call it." Drop marketing prose, restatements, and long examples.
- Remove redundant or overlapping tools.
- Prune input schemas — long enums, deep nesting, boilerplate, unnecessary optional params.
- Enable tool search / load-on-demand if your client supports it, so unused tools stay out of context.
- Consider code execution with MCP for workflows that chain many tool calls.
- Don't wrap every API endpoint. Curate to a few intent-shaped tools.
- Verify versions. Confirm spec and feature specifics against the current MCP spec and your vendor's docs — the ecosystem moves fast.
Measure Your Tool Definitions
Start with the MCP tool token counter — paste your tools' name, description, and inputSchema to see the token cost per tool and in total, so you know which definitions to trim first. It runs entirely in your browser, so you can paste internal tool schemas without sending them anywhere. To count tokens for the prompts and outputs that share the same window, use the general token counter.
Frequently Asked Questions
Why do MCP tools use up context tokens before I even send a message?
Because an MCP client fetches each connected server's tools via tools/list and includes every tool's name, description, and inputSchema (a JSON Schema) in the model's context on every request. That happens before the user types anything, so it's a fixed cost paid on each turn. Ten verbose tools across a few servers can occupy a large share of the window before the actual conversation starts. The fix is to make each definition smaller (trim descriptions and schemas) or to stop sending all of them upfront (tool search / load-on-demand).
How many tokens do MCP tool definitions actually cost?
Per-tool cost commonly ranges from a couple hundred tokens to well over a thousand, depending on how long the description is and how deep the input schema goes. At the high end, reported measurements show connecting several popular servers (for example GitHub plus Slack plus Sentry) consuming on the order of ~70% of a 200K-token context window in tool definitions alone, before the first query — present that as a reported measurement, not a guarantee, since it depends heavily on which servers and how many tools. The only way to know your real number is to count it: use a tool like the MCP tool token counter to find the heaviest definitions, then trim those first.
Does having too many MCP tools hurt accuracy, not just token count?
Yes — and this is often the bigger problem. Tool-selection accuracy degrades as the number of tools grows because the model has more lookalike options to choose between. One widely-cited 2026 result reported selection accuracy falling from roughly 43% to under 14% with a bloated tool set; treat that as reported research rather than a fixed law, but the direction is consistent across reports. Practical clients also start to struggle past roughly a few dozen tools — Cursor's practical ceiling is around 40. Clear, well-scoped tool descriptions improve selection accuracy, so trimming is a double win: fewer tokens and better choices.
What is the single biggest lever for cutting MCP token usage?
Editorial trimming, because it's the cheapest and most immediate. Cut each tool description down to "what it does and when to call it," and delete marketing prose, redundant restatements, and long inline examples. Then remove redundant or overlapping tools entirely and prune verbose input schemas — long enums, deep nesting, repeated boilerplate, and unnecessary optional parameters. After editorial, the structural features (tool search / load-on-demand and code execution with MCP) give larger reductions but require client or server support. Architecturally, don't expose every API endpoint as a tool — that's what causes the explosion in the first place.
What are tool search and code execution with MCP, and how much do they save?
Both are structural approaches added in late 2025 to avoid putting every tool definition in context upfront. With tool search / load-on-demand, the client searches for relevant tools and loads only those — Anthropic reported a case dropping from ~77k to ~8.7k tokens, and it's now the default in Claude Code once tools exceed about 10% of context. With code execution with MCP, the model calls tools by writing code in a sandbox instead of emitting one definition-heavy tool call at a time — Anthropic reported ~150k to ~2k tokens (~98.7%) in one example. Cite these as Anthropic-reported figures; your savings depend on your tool set and workload. Availability varies by client, so confirm against the current MCP spec and your vendor's docs.