Dev Tools

Whisper API Cost Calculator

OpenAI Whisper at $0.006/min. Toggle batch discount, compare against Deepgram and AssemblyAI.

Quick Answer

OpenAI Whisper is $0.006/minute, or $0.36/hour. Batch API drops to $0.003/min (50% off, 24h SLA). 100 hours/month costs $36 standard or $18 batch. Self-hosted faster-whisper on GPU runs near $0.0005/min at scale.

3,000 minutes

Monthly cost

$18.00

Yearly cost

$216.00

Compare against competitors

ProviderPer minMonthlyYearly
OpenAI Whisper API
Default option
$0.0060$18.00$216.00
OpenAI Whisper (Batch)
50% off, 24h SLA
$0.0030$9.00$108.00
Deepgram Nova-3
Real-time, diarization
$0.0043$12.90$154.80
AssemblyAI Universal-2
Speaker labels included
$0.0065$19.50$234.00
Self-hosted faster-whisper
GPU-amortized
$0.0005$1.50$18.00

About This Tool

The Whisper API Cost Calculator estimates monthly and yearly transcription cost given a volume of audio in hours. OpenAI Whisper bills at $0.006 per minute of audio processed, with a 50% Batch API discount that drops the rate to $0.003/min in exchange for a 24-hour completion SLA.

What Whisper costs in practice

A 60-minute podcast: $0.36. A 30-minute customer call: $0.18. A 10-hour transcription day: $3.60. Most teams running meeting transcription, podcast indexing, or call analytics land between $50 and $500 per month on Whisper alone. The Batch API cuts that in half if you can wait overnight.

Whisper vs the competition

Deepgram Nova-3 at $0.0043/min is faster, supports real-time streaming, and includes diarization (speaker labels) by default. Whisper requires post-processing for speaker separation. AssemblyAI Universal-2 ($0.0065/min) bundles speaker labels, PII redaction, summarization, and entity extraction — often worth the small premium for compliance-sensitive use cases.

For real-time use cases (live captioning, voice agents), Whisper's API is not streaming — you upload the file, wait for the full transcript, and respond. Deepgram and AssemblyAI both offer true streaming. If you build a voice AI product, you'll likely need one of those plus a fast TTS layer.

Self-hosted economics

faster-whisper on a single $0.30/hour cloud GPU (an A10 or L4) transcribes roughly 10 hours of audio per hour of wall-clock time using the large-v3 model. That's $0.0005 per minute of audio — 12x cheaper than the API. Below ~500 hours/month, the API wins on total cost of ownership once you factor DevOps time. Above that threshold, self-hosting starts paying off.

Pair with the TTS cost calculator for full voice pipeline budgeting. For transcript downstream LLM cost, see GPT cost calculator and Claude cost calculator. Estimate transcript token volume with the token counter.

Frequently Asked Questions

How much does OpenAI Whisper cost?
$0.006 per minute of audio. A one-hour podcast costs $0.36. A 10-minute meeting transcript costs $0.06. The Batch API option drops the rate to $0.003/minute with a 24-hour completion SLA.
How does Whisper compare to Deepgram and AssemblyAI?
Whisper is the cheapest at $0.006/min and supports 99 languages. Deepgram Nova-3 ($0.0043/min) is faster, with real-time streaming and built-in diarization. AssemblyAI ($0.0065/min) bundles speaker labels and PII redaction. Pick by feature need, not just price.
Is self-hosted Whisper cheaper?
At scale, yes. faster-whisper on a $0.30/hour GPU processes ~10 hours of audio per hour of compute, landing near $0.0005/minute. Below ~500 hours/month, the API wins on TCO once you include DevOps time.
Does Whisper handle non-English audio?
Yes — Whisper supports 99 languages with automatic detection. Quality varies by language: English, Spanish, French, German, and Mandarin are best. Low-resource languages may need post-processing or alternative providers.
Are there file size or duration limits?
OpenAI's API caps single uploads at 25MB. Long files need chunking — split into 10-15 minute segments with overlap. The Batch API handles longer files but adds the 24-hour delay. For real-time, use streaming with Deepgram or AssemblyAI.