Function Calling Cost Calculator
Estimate agentic workflow cost with N tool calls per query.
Quick Answer
Each tool call re-sends the full conversation history. A 3-tool-call query with a 5K-token system + tool definitions costs 4-6x a single-shot query. Tool definitions typically run 200-500 tokens each. Prompt caching on stable prefixes recovers 60-90% of the input cost.
| Model | Input tokens | Per query | Monthly |
|---|---|---|---|
| GPT-4o | 25,900 | $0.0712 | $711.50 |
| GPT-4o-mini | 25,900 | $0.0043 | $42.69 |
| Claude Sonnet 4.6 | 25,900 | $0.0873 | $873.00 |
| Claude Haiku 4.5 | 25,900 | $0.0291 | $291.00 |
| Gemini 2.5 Pro | 25,900 | $0.0356 | $355.75 |
| Gemini 2.5 Flash | 25,900 | $0.0094 | $93.70 |
About This Tool
The Function Calling Cost Calculator estimates the true cost of agentic LLM workflows where the model makes one or more tool calls before producing a final answer. Standard cost calculators understate this — they don't account for the conversation history accumulating across iterations.
Why agent costs balloon
Every tool call triggers a fresh model invocation. The full conversation — system prompt, tool definitions, user query, all prior tool calls, all prior tool results — gets re-sent on every step. A 3-tool-call query with a 5000-token system prompt and 3000-token tool catalog ends up sending 30K+ tokens to the model across iterations, even though the user's actual question was only 200 tokens.
The math, step by step
Iteration 1: system + tools + query in, tool call JSON out. Iteration 2: same input plus tool result, tool call out. Iteration N+1: all prior context plus last tool result, final answer out. Total input grows quadratically-ish — each iteration includes everything before it. For a 5-tool-call query with a 5K-token static prefix, you can easily hit 50K total input tokens.
Cost-reduction levers
Prompt caching: cache the system prompt and tool definitions. Anthropic's 90% discount on cache reads turns a $0.015 input cost per iteration into $0.0015. Across 5 iterations that's $0.07 saved per query. Tool routing: don't carry all 50 tools on every call. Use a router model to pick the relevant 3-5 tools for the query, then run the agent with just those. Parallel tools: most providers support multiple tool calls per turn — use them when calls are independent.
Model selection for agents
Default to a fast cheap model (Haiku 4.5, GPT-4o-mini, Gemini Flash) for tool routing and execution. Escalate to Sonnet, GPT-4o, or Gemini Pro only when the query requires deeper reasoning. A typical agent stack runs 80% of work on Haiku and 20% on Sonnet, saving 70-80% versus running everything on the flagship.
Reliability vs cost
Cheap models often fail tool calling more often — wrong arguments, hallucinated tool names, malformed JSON. Each failure forces a retry, which compounds cost. A model that's 50% cheaper but fails twice as often costs the same. Always evaluate quality-adjusted cost on real agent traces.
Pair with the prompt caching savings calculator, LLM cost comparison, LLM latency calculator, and Claude cost calculator. For total agent budget, see AI monthly budget calculator.
Frequently Asked Questions
Why do agentic workflows cost so much more than single calls?
How much does a tool definition cost in tokens?
Can prompt caching help with agent workflows?
What's the cheapest way to run a multi-tool agent?
How do tool results affect output token cost?
You might also like
Base64 Encoder & Decoder
Encode and decode Base64 with URL-safe mode and file upload.
⏱ instantDev ToolsLLM Cost Comparison
Side-by-side monthly cost across 12 major LLMs from every major provider.
⏱ instantDev ToolsWhisper API Cost Calculator
OpenAI Whisper at $0.006/min plus Deepgram and AssemblyAI comparison.
⏱ instant