LLM Cost Comparison
Side-by-side monthly cost across 12 major LLMs. Enter your token volume, find the cheapest match.
Quick Answer
Cheapest tier: Gemini Flash-Lite ($0.10/$0.40), GPT-4o-mini ($0.15/$0.60), DeepSeek V3 ($0.27/$1.10). Frontier tier: Gemini 2.5 Pro ($1.25/$5) undercuts GPT-4o ($2.50/$10) and Claude Sonnet ($3/$15). For absolute peak quality, Claude Opus 4.7 ($15/$75).
| Model | In/Out | Per call | Monthly |
|---|---|---|---|
Gemini 2.5 Flash-Lite Google | $0.1 / $0.4 | $0.000520 | $5.20 |
GPT-4o-mini OpenAI | $0.15 / $0.6 | $0.000780 | $7.80 |
DeepSeek V3 DeepSeek | $0.27 / $1.1 | $0.001420 | $14.20 |
Llama 3.3 70B (Together) Together | $0.88 / $0.88 | $0.002464 | $24.64 |
Gemini 2.5 Flash Google | $0.3 / $2.5 | $0.002600 | $26.00 |
Claude Haiku 4.5 Anthropic | $1 / $5 | $0.006000 | $60.00 |
Gemini 2.5 Pro Google | $1.25 / $5 | $0.006500 | $65.00 |
Mistral Large 2 Mistral | $2 / $6 | $0.008800 | $88.00 |
GPT-4o OpenAI | $2.5 / $10 | $0.0130 | $130.00 |
Claude Sonnet 4.6 Anthropic | $3 / $15 | $0.0180 | $180.00 |
GPT-4-turbo OpenAI | $10 / $30 | $0.0440 | $440.00 |
Claude Opus 4.7 Anthropic | $15 / $75 | $0.0900 | $900.00 |
About This Tool
The LLM Cost Comparison tool puts every major language model side-by-side at your specific token volume. Enter input tokens per call, output tokens per call, and monthly request count. The tool computes per-call and monthly cost across 12 production models from OpenAI, Anthropic, Google, Mistral, DeepSeek, and hosted open-source providers.
The 2026 LLM pricing landscape
Three pricing tiers have emerged. Frontier ($1.25-$15 per million input): Claude Opus, GPT-4o, Gemini 2.5 Pro. Balanced ($0.30-$3): Sonnet 4.6, Gemini Flash, Mistral Large. Cheap ($0.10-$0.30): Flash-Lite, GPT-4o-mini, DeepSeek V3, Haiku 4.5. Hosted open-source (Llama, Mixtral) typically lands in the balanced tier.
How to read the comparison
The cheapest model isn't always the right one. A 50% cheaper model that fails 5% more often may cost more in retry overhead, support tickets, and brand damage. Run your own evals — pick three candidates from this comparison, evaluate on 100+ representative examples, then choose by quality-adjusted cost.
Output token cost dominates most workloads. Models with 4-5x output multipliers (which is all of them) become 4-5x more expensive when responses are long. Cap max_tokens. Prefer structured outputs over prose. Use cheaper models for first-pass generation and a flagship for final review.
Caching shifts the math
Anthropic's prompt cache cuts cached input by 90%. OpenAI auto-caches at 50% off. Gemini's context cache varies by tier. If your prompt has a stable 5K+ token prefix reused across many turns, caching can flip the cost ranking — Sonnet with caching often beats GPT-4o without it on long-context workloads.
Drill deeper with GPT cost calculator, Claude cost calculator, Gemini cost calculator, and the prompt caching savings calculator. To go from raw text to token estimates, use the token counter.
Frequently Asked Questions
Which LLM has the cheapest API in 2026?
How should I choose between GPT, Claude, and Gemini?
Are open-source models really cheaper?
What about prompt caching savings?
Why is output 4-5x more than input?
You might also like
Markdown Table Generator
Generate markdown tables with custom rows, columns, and alignment.
⏱ 1 minDev ToolsVector DB Cost Calculator
Pinecone, Weaviate, Qdrant cost by dimensions, records, and queries.
⏱ instantDev ToolsAI Monthly Budget Calculator
Set a monthly cap, see what usage that buys across LLMs and media.
⏱ instant