google

Google: Gemini 2.5 Pro Preview 05-06

Gemini 2.5 Pro Preview 05-06 is a multimodal model from Google that accepts text, images, files, audio, and video as input. Its context window reaches roughly one million tokens, which accommodates very long documents or extended multi-turn sessions. The model supports tool use and reasoning, making it applicable to agentic workflows and multi-step problem solving. Structured output support is not confirmed in available data. At $1.25 per million input tokens and $10.00 per million output tokens, the output cost is on the higher end of the current market, so high-volume generation workloads will add up quickly. Benchmark coverage is limited to two benchmarks with a blended score of 64.3, which is a narrow basis for comparison; treat performance claims as provisional until broader evaluations are available. Teams with genuine need for long-context multimodal processing, including audio and video, will find the capability set relatively broad, but cost-sensitive or output-heavy use cases should model expenses carefully before committing.

Quality Score
100/100
price + capability + benchmarks
Input Price
$1.25
per 1M tokens
Output Price
$10.00
per 1M tokens
Context Window
1,048,576
tokens
Model ID
google/gemini-2.5-pro-preview-05-06
Vendor
google
Tokenizer
Gemini
Input Modalities
text, image, file, audio, video
Output Modalities
text
Max Output
65,535 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
✓ accepts audio
Moderated
no

Strong choice for

Category rankings

Where Google: Gemini 2.5 Pro Preview 05-06 places across the 6 categories it ranks in. How we rank →

#CategoryScore
#4 Audio SummarizationVoice · of 19 ranked 151
#7 Video SummarizationVideo · of 25 ranked 151
#16 TranscriptionVoice · of 19 ranked 115
#16 TTS ReplacementVoice · of 19 ranked 115
#24 OCR / Document ParsingData · of 25 ranked 138
#24 Table Extraction from PDFsData · of 25 ranked 138

Similar models