Question 1

How does Gemini 2.5 Pro long-context pricing work?

Accepted Answer

Pro has a tiered structure. Up to 200K input tokens: $1.25 input / $5 output per million. Above 200K: $2.50 / $10. Most prompts stay under the threshold, but RAG over large document sets can cross it — model accordingly.

Question 2

Which Gemini model is cheapest?

Accepted Answer

Flash-Lite at $0.10 input / $0.40 output per million tokens. It's roughly 12x cheaper than Pro on input and 12x cheaper than Flash. Use it for classification, simple extraction, and agentic tool routing where deep reasoning isn't needed.

Question 3

How does Gemini compare to GPT-4o on cost?

Accepted Answer

Gemini 2.5 Pro ($1.25/$5) is roughly half the price of GPT-4o ($2.50/$10). Gemini 2.5 Flash ($0.30/$2.50) sits between GPT-4o-mini and GPT-4o. Flash-Lite undercuts everything in the cost-per-token race.

Question 4

What's special about Gemini's 1M context window?

Accepted Answer

Gemini 2.5 Pro accepts up to 1 million tokens of input — roughly 750K English words or 1500 pages. This enables document-stuffing patterns where you skip RAG entirely and just paste the corpus. The trade-off is latency and the long-context price tier above 200K.

Question 5

Does Google offer Gemini context caching?

Accepted Answer

Yes. Context caching drops cached input cost. Discounts vary by model and storage duration but commonly range from 25-75% off cached portions. Useful when you have a stable 32K+ token prefix reused across many requests.

Model	In rate	Out rate	Per call	Monthly
Gemini 2.5 Pro 1M context, top reasoning	$1.25/M	$5/M	$0.006500	$65.00
Gemini 2.5 Flash Fast, multimodal	$0.3/M	$2.5/M	$0.002600	$26.00
Gemini 2.5 Flash-Lite Cheapest, classification	$0.1/M	$0.4/M	$0.000520	$5.20

Gemini API Cost Calculator

About This Tool

Gemini pricing (April 2026)

The 1M-token context window

When to pick which Gemini

Frequently Asked Questions

You might also like

Color Contrast Checker

JWT Decoder

Video Aspect Ratio Calculator