Dev Tools

Gemini API Cost Calculator

Compare Gemini 2.5 Pro, Flash, and Flash-Lite costs with long-context pricing.

Quick Answer

Gemini 2.5 Pro is $1.25 / $5 (or $2.50 / $10 above 200K tokens). Flash is $0.30 / $2.50. Flash-Lite is $0.10 / $0.40. Pro's 1M-token context window enables full-document stuffing without RAG.

ModelIn rateOut ratePer callMonthly
Gemini 2.5 Pro
1M context, top reasoning
$1.25/M$5/M$0.006500$65.00
Gemini 2.5 Flash
Fast, multimodal
$0.3/M$2.5/M$0.002600$26.00
Gemini 2.5 Flash-Lite
Cheapest, classification
$0.1/M$0.4/M$0.000520$5.20

About This Tool

The Gemini API Cost Calculator handles all three Gemini 2.5 tiers — Pro, Flash, and Flash-Lite — including the long-context pricing kink that hits Pro at 200K input tokens. Enter token counts and request volume, and the tool computes per-call and monthly cost across the lineup.

Gemini pricing (April 2026)

Gemini 2.5 Pro: $1.25 input / $5 output per million tokens up to 200K input tokens. Beyond 200K, the rate doubles to $2.50 / $10 to reflect the cost of attending over very long contexts. Gemini 2.5 Flash: $0.30 / $2.50 with a 1M-token context window. Flash-Lite: $0.10 / $0.40 — the cheapest production-grade model from any major lab.

The 1M-token context window

Pro's defining feature is its 1 million token context. That fits roughly 1500 pages of text or 50 hours of audio transcript. For document Q&A and codebase analysis, this lets you skip RAG entirely and stuff the full corpus into the prompt — at the cost of higher per-call spend and slower latency. The economics break even with RAG around 50K-100K total tokens; below that, stuffing wins.

When to pick which Gemini

Pro for hard reasoning, long documents, and multimodal work. Flash for everyday chat and structured extraction at scale — it's often the best price-per-performance option in the market. Flash-Lite for ultra-high-volume classification, routing, and lightweight agents where the cheapest token is the right token.

Compare against alternatives: GPT API cost calculator, Claude API cost calculator, and the LLM cost comparison. For long-context budgeting, see the context window calculator. Estimate token counts with the token counter.

Frequently Asked Questions

How does Gemini 2.5 Pro long-context pricing work?
Pro has a tiered structure. Up to 200K input tokens: $1.25 input / $5 output per million. Above 200K: $2.50 / $10. Most prompts stay under the threshold, but RAG over large document sets can cross it — model accordingly.
Which Gemini model is cheapest?
Flash-Lite at $0.10 input / $0.40 output per million tokens. It's roughly 12x cheaper than Pro on input and 12x cheaper than Flash. Use it for classification, simple extraction, and agentic tool routing where deep reasoning isn't needed.
How does Gemini compare to GPT-4o on cost?
Gemini 2.5 Pro ($1.25/$5) is roughly half the price of GPT-4o ($2.50/$10). Gemini 2.5 Flash ($0.30/$2.50) sits between GPT-4o-mini and GPT-4o. Flash-Lite undercuts everything in the cost-per-token race.
What's special about Gemini's 1M context window?
Gemini 2.5 Pro accepts up to 1 million tokens of input — roughly 750K English words or 1500 pages. This enables document-stuffing patterns where you skip RAG entirely and just paste the corpus. The trade-off is latency and the long-context price tier above 200K.
Does Google offer Gemini context caching?
Yes. Context caching drops cached input cost. Discounts vary by model and storage duration but commonly range from 25-75% off cached portions. Useful when you have a stable 32K+ token prefix reused across many requests.