google

Google: Gemini 3 Flash Preview

Gemini 3 Flash Preview is Google's multimodal model accepting text, images, files, audio, and video as input. Its context window reaches 1,048,576 tokens, which accommodates long documents, extended conversations, or large media files without truncation. The model supports tool use and reasoning, giving it footing in agentic and multi-step workflows. Structured output support is unconfirmed, so developers who depend on reliable JSON schema enforcement should verify that before committing. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid range for multimodal models. Its blended benchmark score of 44.7, drawn from a single benchmark, is too narrow to draw firm conclusions about overall capability, so treat that figure as provisional. Teams that need broad input modality coverage at a relatively low input cost are the natural audience, though the thin benchmark coverage means real-world testing should carry more weight than the score alone.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.50
per 1M tokens
Output Price
$3.00
per 1M tokens
Context Window
1,048,576
tokens
Model ID
google/gemini-3-flash-preview
Vendor
google
Tokenizer
Gemini
Input Modalities
text, image, file, audio, video
Output Modalities
text
Max Output
65,535 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
✓ accepts audio
Moderated
no

Strong choice for

Category rankings

Where Google: Gemini 3 Flash Preview places across the 6 categories it ranks in. How we rank →

#CategoryScore
#5 TranscriptionVoice · of 19 ranked 123
#6 Audio SummarizationVoice · of 19 ranked 145
#10 TTS ReplacementVoice · of 19 ranked 115
#14 Video SummarizationVideo · of 25 ranked 145
#19 Code CompletionCode · of 25 ranked 132
#19 Image CaptioningVision · of 25 ranked 120

Similar models