The Hidden Token Cost of AI Tools
My first real lesson from testing OpenAI + Zapier MCP: the more tools you enable, the more tokens you use.
I enabled all of the tool options within Google Calendar, Gmail, Slack, and Google Sheets - around 60 total. Sending a short email used 17,000 tokens. That’s a lot.
I tried again with only 5 tools enabled. Sending the same email cost 4,000 tokens.
Why this happens
Every enabled tool sends its full description - name, parameters, usage instructions - as part of the model’s context on every single request. The model needs to know what tools are available before it can decide which ones to use. More tools means more context, which means more tokens - whether you use those tools or not.
At 60 tools, you’re burning ~13,000 tokens of overhead before the model even reads your prompt. That’s roughly $0.01 per request at current API prices. It sounds small until you realize that an agent taking 10 actions on a task just spent $0.10 on tool descriptions alone.
This is the same problem that shows up everywhere in AI systems: the cost of context isn’t what you use, it’s what you include. Every token in the context window has a cost, and irrelevant tokens aren’t just wasteful - they can actively degrade output quality by diluting the model’s attention.
The fix is retrieval, not brute force
The brute-force approach is to manually curate which tools are enabled for each use case. That works for a single user, but it doesn’t scale. You can’t ask every user to configure their tool set for every type of task.
A better approach:
- User sends a prompt
- A lightweight retrieval step identifies which tools are actually relevant to that specific request
- Only those tool descriptions get included in the context
This is a retrieval problem, not a configuration problem. And it’s the same retrieval problem that shows up with documents, memories, instructions - any context that an AI system might need. The question is always the same: how do you include what’s relevant and exclude what isn’t, before the model starts working?
I didn’t fully appreciate this at the time, but this experiment planted a seed. The token cost of tools was the first concrete example I encountered of a broader pattern: the most impactful optimization in AI systems isn’t making the model smarter - it’s making sure the model’s context contains exactly what it needs and nothing else.
Update (March 2026): Anthropic has since shipped a system in Claude that loads tool descriptions on demand rather than including all of them on every request - essentially solving this specific problem. It’s a good validation that the cost was real and worth fixing. The broader principle still applies everywhere else: documents, memories, instructions, any context that gets included by default rather than retrieved selectively.