Google: Gemini 2.5 Pro Preview 05-06
Gemini 2.5 Pro Preview 05-06 is a multimodal model from Google that accepts text, images, files, audio, and video as input. Its context window reaches roughly one million tokens, which accommodates very long documents or extended multi-turn sessions. The model supports tool use and reasoning, making it applicable to agentic workflows and multi-step problem solving. Structured output support is not confirmed in available data. At $1.25 per million input tokens and $10.00 per million output tokens, the output cost is on the higher end of the current market, so high-volume generation workloads will add up quickly. Benchmark coverage is limited to two benchmarks with a blended score of 64.3, which is a narrow basis for comparison; treat performance claims as provisional until broader evaluations are available. Teams with genuine need for long-context multimodal processing, including audio and video, will find the capability set relatively broad, but cost-sensitive or output-heavy use cases should model expenses carefully before committing.
- Model ID
- google/gemini-2.5-pro-preview-05-06
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- text, image, file, audio, video
- Output Modalities
- text
- Max Output
- 65,535 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no
Strong choice for
Category rankings
Where Google: Gemini 2.5 Pro Preview 05-06 places across the 6 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #4 | Audio SummarizationVoice · of 19 ranked | 151 |
| #7 | Video SummarizationVideo · of 25 ranked | 151 |
| #16 | TranscriptionVoice · of 19 ranked | 115 |
| #16 | TTS ReplacementVoice · of 19 ranked | 115 |
| #24 | OCR / Document ParsingData · of 25 ranked | 138 |
| #24 | Table Extraction from PDFsData · of 25 ranked | 138 |