Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
Google's Nano Banana 2 is a multimodal model from Google that accepts both text and image inputs and returns up to 65,536 tokens per response within a 131,072-token context window. It supports reasoning, but does not support tool use or structured output. That combination makes it suited to tasks where vision understanding and extended context matter more than function calling or guaranteed output schemas. At $0.50 per million input tokens and $3.00 per million output tokens, the pricing sits in a budget-to-mid range for input but leans higher on output costs, so users with output-heavy workflows should factor that in. The harder limitation for comparison purposes is that the model has no independent benchmark coverage yet, meaning its quality relative to alternatives is unproven. Teams comfortable evaluating an unproven model against their own use cases may find it worth testing, but those who rely on published benchmarks to shortlist candidates will want to wait for third-party results before committing.
- Model ID
- google/gemini-3.1-flash-image-preview
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- image, text
- Output Modalities
- image, text
- Max Output
- 65,536 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Category rankings
Where Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) places across the 1 category it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #6 | Image GenerationVision · of 8 ranked | 99 |