Google: Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is a multimodal model from Google that accepts text, images, video, audio, and files as input. Its context window reaches 1,048,576 tokens, making it suitable for tasks that require ingesting long documents or extended conversation histories. The model supports tool use and reasoning, which enables agentic workflows and multi-step problem solving. Maximum output is capped at 65,536 tokens per response. At $0.25 per million input tokens and $1.50 per million output tokens, it sits at the affordable end of the market, which makes it worth considering for high-volume or cost-sensitive applications. The tradeoff is that there is currently no independent benchmark coverage to validate its quality claims, so teams evaluating Gemini 3.1 Flash Lite are working without third-party performance data. Buyers who prioritize proven, measurable quality should treat it as unproven for now and run their own evals before committing.
- Model ID
- google/gemini-3.1-flash-lite
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- text, image, video, file, audio
- Output Modalities
- text
- Max Output
- 65,536 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no
Strong choice for
Category rankings
Where Google: Gemini 3.1 Flash Lite places across the 7 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #1 | TranscriptionVoice · of 19 ranked | 123 |
| #2 | TTS ReplacementVoice · of 19 ranked | 115 |
| #3 | Video Auto-TaggingVideo · of 25 ranked | 123 |
| #9 | Audio SummarizationVoice · of 19 ranked | 139 |
| #21 | Video SummarizationVideo · of 25 ranked | 139 |
| #25 | Social Media PostsWriting · of 25 ranked | 119 |
| #25 | Voice Assistant BackendVoice · of 25 ranked | 123 |