Prompt Caching Savings Calculator
Compare cached vs uncached input cost across Anthropic (90% off reads) and OpenAI (50% off).
Quick Answer
Anthropic prompt caching: writes cost 1.25x base, reads cost 0.1x — a 90% discount. Break-even hits at 2 cache reads. OpenAI caches automatically at 50% off cached reads. For a 5K-token prefix reused 20x, Anthropic saves 85%; OpenAI saves 47%.
System prompt + tool defs + stable docs
User query + dynamic context
Reuses before cache expiration
| Model | No cache | With cache | Saved | % |
|---|---|---|---|---|
Claude Opus 4.7 0.1x reads | $4125.00 | $955.36 | $3169.64 | 77% |
Claude Sonnet 4.6 0.1x reads | $825.00 | $191.07 | $633.93 | 77% |
Claude Haiku 4.5 0.1x reads | $275.00 | $63.69 | $211.31 | 77% |
GPT-4o 0.5x reads | $687.50 | $389.88 | $297.62 | 43% |
GPT-4o-mini 0.5x reads | $41.25 | $23.39 | $17.86 | 43% |
Output token cost is unaffected by caching and excluded from this comparison. Anthropic requires explicit cache_control flags; OpenAI auto-caches identical prefixes ≥1024 tokens. Cache TTL: 5min default (Anthropic), variable (OpenAI).
About This Tool
The Prompt Caching Savings Calculator quantifies what prompt caching saves on your specific workload. Enter the size of your stable prefix (system prompt, tool definitions, retrieved documents), the size of variable per-query content, the number of cache reads per write before the cache expires, and your monthly request volume. The tool computes input cost with and without caching across Anthropic and OpenAI models.
Anthropic prompt caching mechanics
Cache writes cost 1.25x the base input rate — a 25% premium for storing the prefix. Cache reads cost 0.1x — a 90% discount. The break-even is two cache reads: after that, every additional reuse is pure savings. Cache TTL defaults to 5 minutes (refreshable with each hit); an extended 1-hour cache is available for batch workloads.
To use Anthropic caching, mark cacheable blocks in your messages with cache_control: { type: "ephemeral" }. Up to 4 cache checkpoints are supported per request. Common patterns: cache the system prompt, cache tool definitions separately, cache retrieved documents that span multiple follow-up turns.
OpenAI prompt caching mechanics
OpenAI auto-caches identical prefixes of 1024+ tokens. Cache reads cost 0.5x base — a 50% discount with no write premium. The break-even is immediate. The trade-off: you don't control which prefixes get cached, and cache hits are routed through OpenAI's servers based on heuristics. Less control, more automatic.
Real-world savings examples
Customer support bot with a 10K-token system prompt + 50 tool definitions, reused across 50 turns per session: Anthropic saves ~87% on input cost; OpenAI saves ~47%. RAG application with 5K-token retrieved documents reused for 5 follow-up questions: Anthropic saves ~75%; OpenAI saves ~40%. Code agent with 8K-token codebase context reused 30 times: Anthropic saves ~85%; OpenAI saves ~48%.
How to maximize hit rate
Frontload stable content. Move system prompts and tool definitions to the very start of the message array. Put dynamic user input last. Avoid timestamp injection or per-request UUIDs in the cached portion — they invalidate the cache. Standardize formatting (whitespace, ordering) so byte-equivalent prefixes match consistently.
Pair with the Claude cost calculator, GPT cost calculator, function calling cost calculator, and LLM cost comparison. To shrink prompts before caching, use prompt token optimizer. For text-to-token estimates, the token counter.