Dev Tools

Prompt Caching Savings Calculator

Compare cached vs uncached input cost across Anthropic (90% off reads) and OpenAI (50% off).

Quick Answer

Anthropic prompt caching: writes cost 1.25x base, reads cost 0.1x — a 90% discount. Break-even hits at 2 cache reads. OpenAI caches automatically at 50% off cached reads. For a 5K-token prefix reused 20x, Anthropic saves 85%; OpenAI saves 47%.

System prompt + tool defs + stable docs

User query + dynamic context

Reuses before cache expiration

ModelNo cacheWith cacheSaved%
Claude Opus 4.7
0.1x reads
$4125.00$955.36$3169.6477%
Claude Sonnet 4.6
0.1x reads
$825.00$191.07$633.9377%
Claude Haiku 4.5
0.1x reads
$275.00$63.69$211.3177%
GPT-4o
0.5x reads
$687.50$389.88$297.6243%
GPT-4o-mini
0.5x reads
$41.25$23.39$17.8643%

Output token cost is unaffected by caching and excluded from this comparison. Anthropic requires explicit cache_control flags; OpenAI auto-caches identical prefixes ≥1024 tokens. Cache TTL: 5min default (Anthropic), variable (OpenAI).

About This Tool

The Prompt Caching Savings Calculator quantifies what prompt caching saves on your specific workload. Enter the size of your stable prefix (system prompt, tool definitions, retrieved documents), the size of variable per-query content, the number of cache reads per write before the cache expires, and your monthly request volume. The tool computes input cost with and without caching across Anthropic and OpenAI models.

Anthropic prompt caching mechanics

Cache writes cost 1.25x the base input rate — a 25% premium for storing the prefix. Cache reads cost 0.1x — a 90% discount. The break-even is two cache reads: after that, every additional reuse is pure savings. Cache TTL defaults to 5 minutes (refreshable with each hit); an extended 1-hour cache is available for batch workloads.

To use Anthropic caching, mark cacheable blocks in your messages with cache_control: { type: "ephemeral" }. Up to 4 cache checkpoints are supported per request. Common patterns: cache the system prompt, cache tool definitions separately, cache retrieved documents that span multiple follow-up turns.

OpenAI prompt caching mechanics

OpenAI auto-caches identical prefixes of 1024+ tokens. Cache reads cost 0.5x base — a 50% discount with no write premium. The break-even is immediate. The trade-off: you don't control which prefixes get cached, and cache hits are routed through OpenAI's servers based on heuristics. Less control, more automatic.

Real-world savings examples

Customer support bot with a 10K-token system prompt + 50 tool definitions, reused across 50 turns per session: Anthropic saves ~87% on input cost; OpenAI saves ~47%. RAG application with 5K-token retrieved documents reused for 5 follow-up questions: Anthropic saves ~75%; OpenAI saves ~40%. Code agent with 8K-token codebase context reused 30 times: Anthropic saves ~85%; OpenAI saves ~48%.

How to maximize hit rate

Frontload stable content. Move system prompts and tool definitions to the very start of the message array. Put dynamic user input last. Avoid timestamp injection or per-request UUIDs in the cached portion — they invalidate the cache. Standardize formatting (whitespace, ordering) so byte-equivalent prefixes match consistently.

Pair with the Claude cost calculator, GPT cost calculator, function calling cost calculator, and LLM cost comparison. To shrink prompts before caching, use prompt token optimizer. For text-to-token estimates, the token counter.

Frequently Asked Questions

How do Anthropic and OpenAI prompt caching differ?
Anthropic charges 1.25x base on cache writes and 0.1x on reads — a 90% discount on cached portions. OpenAI charges base price on writes and 0.5x on cached reads (50% discount, automatic). Anthropic saves more per cached call but requires explicit cache_control flags.
What's the break-even for caching?
Anthropic: roughly 2 reads. The 0.25x premium on the write is recouped after just two cached calls. OpenAI: break-even is immediate since writes are at base price. Beyond break-even, every reuse compounds savings.
What can be cached?
Stable prefixes — system prompts, tool definitions, retrieved documents, few-shot examples. The cache key is the prefix bytes. Even a single character change invalidates the entry. Cache TTL defaults to 5 minutes on Anthropic; 1-hour extended cache available.
Does caching work across users?
Yes if the prefix is identical. A SaaS app with a global system prompt benefits from cache hits across all users. Personalized prompts (user names, preferences) don't share cache entries. Structure prompts so personal data comes after stable content.
Why doesn't caching help small prompts?
Anthropic enforces a 1024-token minimum for caching. Below that, your prompt isn't cacheable. OpenAI auto-caches at 1024 tokens too. Optimize for caching by frontloading stable context — even moving a static instruction block to the system prompt unlocks caching.