Dev Tools

Text-to-Speech Cost Calculator

Compare ElevenLabs, OpenAI, Azure, Google, and PlayHT TTS pricing for any volume.

Quick Answer

OpenAI TTS-1 ($0.015/1K chars), Azure, and Google ($0.016/1K) are the cheapest. ElevenLabs Creator ($0.30/1K) is 20x more expensive but sounds substantially more human. 1000 characters ≈ 1.3 minutes of speech.

Total: 100,000 chars/month133.3 minutes of speech
Provider$/1K charsMonthly
ElevenLabs Creator
Realistic voice cloning
$0.300$30.00
ElevenLabs Pro
Bulk discount tier
$0.180$18.00
ElevenLabs Scale
Enterprise volume
$0.090$9.00
OpenAI TTS-1
Fast, 6 voices
$0.015$1.50
OpenAI TTS-1-HD
Higher quality
$0.030$3.00
Azure Neural TTS
150+ voices, 75 langs
$0.016$1.60
Google Cloud TTS WaveNet
WaveNet voices
$0.016$1.60
PlayHT 2.0
Voice cloning
$0.075$7.50

About This Tool

The TTS Cost Calculator compares text-to-speech pricing across the eight largest providers. Enter characters per request and monthly request count — the tool computes total characters, minutes of equivalent speech, and projected monthly cost for ElevenLabs, OpenAI TTS, Azure Neural, Google Cloud TTS, and PlayHT.

The character-per-minute conversion

English speech averages 150 words per minute at 5 characters per word — roughly 750 characters per minute. A 10-minute audiobook chapter consumes about 7500 characters. A 30-second voice AI response runs around 375 characters. Use this conversion to translate audio length goals into TTS billing units.

Quality tiers in 2026

Three quality tiers exist. The cheap tier ($0.015-$0.030/1K chars): OpenAI TTS-1, Azure Neural, Google WaveNet. They sound clearly synthetic but professional — fine for narration, IVR, and accessibility. The mid tier ($0.05-$0.10): PlayHT, ElevenLabs Pro/Scale. Better prosody, more natural emotion. The premium tier ($0.18-$0.30): ElevenLabs Creator. Indistinguishable-from-human quality, especially with voice cloning.

When does ElevenLabs justify the price?

For voice agents that customers interact with conversationally, paying $0.30/1K is often worth it — the difference between “robotic IVR” and “sounds like a real assistant” affects conversion rates. For background narration, audiobooks, or notification messages, OpenAI TTS at $0.015/1K is fine. Test ElevenLabs Flash if you need both speed (sub-200ms) and quality.

Volume discounts and self-hosting

ElevenLabs Scale tier lands near $0.09/1K at enterprise commit — a 3x discount on Creator. Open-source TTS (XTTS-v2, Coqui, Bark) self-hosted on a $0.30/hour GPU runs around $0.001/1K chars amortized — quality is closing on commercial offerings but still trails on emotional range.

Pair with the Whisper cost calculator for full voice agent input cost. For LLM-driven response generation, see GPT cost calculator and Claude cost calculator. Estimate text size with character counter.

Frequently Asked Questions

Which TTS provider is cheapest?
OpenAI TTS-1 at $0.015 per 1000 characters and Azure/Google at $0.016. They sound 'good enough' for narration. ElevenLabs is 10-20x more expensive but produces noticeably more emotional, human-like output. Pick by quality requirement.
How does ElevenLabs pricing tier work?
ElevenLabs sells character credits in monthly bundles. Creator ($22/mo for 100K chars) works out to $0.30 per 1000 chars. Pro ($99 for 500K) drops to $0.18. Scale (custom enterprise) lands near $0.09. Volume commits beat pay-as-you-go.
How do I convert characters to minutes of audio?
English speech runs ~150 words per minute, ~5 chars per word, so roughly 750 chars per minute. A 10-minute audio narration consumes ~7500 characters. Languages with longer words (German) or scripts (Chinese) shift the ratio.
Is voice cloning extra?
ElevenLabs and PlayHT charge a one-time setup or include voice cloning at higher tiers. OpenAI doesn't offer cloning publicly. Azure has Custom Neural Voice (enterprise-only, ~$5K setup). Voice cloning quality varies significantly by provider.
Are there real-time streaming options?
ElevenLabs and OpenAI both stream audio with first-byte latency under 500ms. Critical for voice AI agents and live conversation. Azure and Google support streaming but with higher latency. For voice agents, ElevenLabs Flash and Cartesia are the speed leaders.