Dev Tools

Context Window Calculator

Convert words, pages, characters, or minutes into tokens — see fit across LLM context windows.

Quick Answer

GPT and Claude cap at 128K and 200K tokens respectively. Gemini 2.5 leads with 1M tokens (~1500 pages). Rule of thumb: 1 word = 1.3 tokens; 1 page = 650 tokens; 1 hour spoken = 12K tokens.

Estimated tokens

1,300

Fit across context windows

GPT-4o1,300 / 128,000 (1%)
GPT-4o-mini1,300 / 128,000 (1%)
GPT-4-turbo1,300 / 128,000 (1%)
Claude Opus 4.71,300 / 200,000 (1%)
Claude Sonnet 4.61,300 / 200,000 (1%)
Claude Haiku 4.51,300 / 200,000 (1%)
Gemini 2.5 Pro1,300 / 1,000,000 (0%)
Gemini 2.5 Flash1,300 / 1,000,000 (0%)
Mistral Large 21,300 / 128,000 (1%)
Llama 3.3 70B1,300 / 128,000 (1%)

About This Tool

The Context Window Calculator converts words, pages, characters, or minutes of audio into estimated tokens, then shows how that count fits inside the context windows of every major LLM. It's for the moment when a customer asks “can the AI read my whole 200-page contract?” — you need a fast yes/no with the right model.

What is a context window?

The context window is the maximum number of tokens a model can process in a single request. It's a hard ceiling: input plus output combined cannot exceed it. GPT-4o and most OpenAI models cap at 128K tokens. Claude Opus, Sonnet, and Haiku all have 200K windows. Gemini 2.5 Pro and Flash both ship with 1M tokens — roughly 1500 pages of English text, or 50 hours of meeting transcript.

Conversion ratios

For English prose: 1 word ≈ 1.3 tokens, 1 page (500 words) ≈ 650 tokens, 1 character ≈ 0.25 tokens, 1 minute of speech (150 wpm) ≈ 195 tokens. Code is denser — JSON with quoted keys can run 30% higher. Non-English text varies: Chinese characters often consume 1-2 tokens each, while Spanish and French sit close to English ratios.

What counts toward the window?

Everything: system prompt, conversation history, retrieved documents, tool definitions, function call outputs, and the model's response. If you're building a chat product with a long system prompt and document retrieval, your effective window is much smaller than the published max. A 200K Claude window with a 5K system prompt and 50K retrieved context leaves 145K for conversation and response.

Long-context performance

Bigger windows don't always mean better answers. Most models show degraded recall and reasoning at the deep end of their windows. Past 100K-150K tokens, even Gemini 2.5 Pro starts dropping precision on needle-in-a-haystack tasks. RAG with focused retrieval often beats document stuffing once you cross 50K tokens.

Pair this with the token counter for raw text, the Gemini cost calculator for long-context pricing, and the RAG vs fine-tune calculator for architecture decisions. For everyday text analysis, see word counter and character counter.

Frequently Asked Questions

How many tokens is a typical book?
A 300-page novel runs roughly 100K-150K tokens. War and Peace clocks in around 750K. A short blog post is 1K-3K. A one-hour meeting transcript is about 8K-12K tokens. The 1M-token Gemini context window fits roughly 1500 pages.
What does the context window include?
Everything you send: system prompt, conversation history, retrieved documents, function definitions, tool outputs — plus the model's response. If you have a 128K window and your prompt is 120K tokens, you only have 8K left for the response. Cap max_tokens accordingly.
Does Gemini's 1M context actually work?
Yes, but performance degrades on needle-in-a-haystack tasks past ~500K tokens. For document QA, 1M works well. For multi-hop reasoning over the full window, expect quality drops. RAG with smaller chunks often outperforms full-document stuffing.
How do I count tokens for a PDF or document?
Convert to plain text first, then estimate: roughly 1 token per 4 characters of English. PDFs often contain layout tokens (line breaks, headers) that inflate counts by 10-20%. Use the official tokenizer SDK for precise counts.
What happens if I exceed the context window?
The API returns an error and the request fails. Most SDKs surface a 'context_length_exceeded' error. Truncate the oldest messages, summarize prior turns, or move to a model with a larger window. Some frameworks (LangChain, LlamaIndex) handle this automatically.