Dev Tools

Function Calling Cost Calculator

Estimate agentic workflow cost with N tool calls per query.

Quick Answer

Each tool call re-sends the full conversation history. A 3-tool-call query with a 5K-token system + tool definitions costs 4-6x a single-shot query. Tool definitions typically run 200-500 tokens each. Prompt caching on stable prefixes recovers 60-90% of the input cost.

Agent overhead: input tokens are 11.8x a single-shot equivalent due to repeated context across 4 iterations.
ModelInput tokensPer queryMonthly
GPT-4o25,900$0.0712$711.50
GPT-4o-mini25,900$0.0043$42.69
Claude Sonnet 4.625,900$0.0873$873.00
Claude Haiku 4.525,900$0.0291$291.00
Gemini 2.5 Pro25,900$0.0356$355.75
Gemini 2.5 Flash25,900$0.0094$93.70

About This Tool

The Function Calling Cost Calculator estimates the true cost of agentic LLM workflows where the model makes one or more tool calls before producing a final answer. Standard cost calculators understate this — they don't account for the conversation history accumulating across iterations.

Why agent costs balloon

Every tool call triggers a fresh model invocation. The full conversation — system prompt, tool definitions, user query, all prior tool calls, all prior tool results — gets re-sent on every step. A 3-tool-call query with a 5000-token system prompt and 3000-token tool catalog ends up sending 30K+ tokens to the model across iterations, even though the user's actual question was only 200 tokens.

The math, step by step

Iteration 1: system + tools + query in, tool call JSON out. Iteration 2: same input plus tool result, tool call out. Iteration N+1: all prior context plus last tool result, final answer out. Total input grows quadratically-ish — each iteration includes everything before it. For a 5-tool-call query with a 5K-token static prefix, you can easily hit 50K total input tokens.

Cost-reduction levers

Prompt caching: cache the system prompt and tool definitions. Anthropic's 90% discount on cache reads turns a $0.015 input cost per iteration into $0.0015. Across 5 iterations that's $0.07 saved per query. Tool routing: don't carry all 50 tools on every call. Use a router model to pick the relevant 3-5 tools for the query, then run the agent with just those. Parallel tools: most providers support multiple tool calls per turn — use them when calls are independent.

Model selection for agents

Default to a fast cheap model (Haiku 4.5, GPT-4o-mini, Gemini Flash) for tool routing and execution. Escalate to Sonnet, GPT-4o, or Gemini Pro only when the query requires deeper reasoning. A typical agent stack runs 80% of work on Haiku and 20% on Sonnet, saving 70-80% versus running everything on the flagship.

Reliability vs cost

Cheap models often fail tool calling more often — wrong arguments, hallucinated tool names, malformed JSON. Each failure forces a retry, which compounds cost. A model that's 50% cheaper but fails twice as often costs the same. Always evaluate quality-adjusted cost on real agent traces.

Pair with the prompt caching savings calculator, LLM cost comparison, LLM latency calculator, and Claude cost calculator. For total agent budget, see AI monthly budget calculator.

Frequently Asked Questions

Why do agentic workflows cost so much more than single calls?
Each tool call triggers a fresh model invocation. The full conversation history (initial prompt + each tool call + each tool result) gets re-sent on every step. A 5-tool-call query can cost 3-8x a single shot because input tokens compound across iterations.
How much does a tool definition cost in tokens?
Each function definition (name, description, JSON schema) typically runs 200-500 tokens. A 10-tool agent might carry 3-5K tokens of definitions on every call. They're sent on every iteration unless you trim the toolset based on context.
Can prompt caching help with agent workflows?
Yes — significantly. Cache the system prompt + tool definitions. Anthropic's cache cuts that prefix to 10% on reads. A 5K-token tool catalog cached across 10 iterations saves $0.05-$0.50 per query depending on model.
What's the cheapest way to run a multi-tool agent?
Haiku 4.5 or GPT-4o-mini for routing + tool execution, escalating to Sonnet/GPT-4o only when reasoning requires it. Add prompt caching on stable prefixes. Use parallel tool calls where possible — most providers support concurrent function calls in a single turn.
How do tool results affect output token cost?
Tool results count as input on the next iteration. Output cost is just the model's text response and tool call requests — usually small. The bigger cost driver is repeated input tokens accumulating across iterations.