Google: Nano Banana 2 (Gemini 3.1 Flash Image)
Google's Nano Banana 2 is a multimodal model from Google that accepts both text and image inputs and returns up to 32,768 tokens of output within a 131,072-token context window. It supports reasoning but does not support tool use, and structured output availability is unconfirmed. These constraints make it a narrower fit than general-purpose alternatives for agentic or pipeline-heavy workflows. At $0.50 per million input tokens and $3.00 per million output tokens, the pricing sits in a competitive range for image-capable models. The catch is that there is currently no independent benchmark coverage, so there is no external data to validate its reasoning quality or accuracy against peers. Buyers comfortable piloting an unproven model, particularly those with image-plus-text workloads and moderate output volume, may find the input price attractive, but teams that need benchmark evidence before committing should wait for independent evaluations.
- Model ID
- google/gemini-3.1-flash-image
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- image, text
- Output Modalities
- image, text
- Max Output
- 32,768 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Strong choice for
Category rankings
Where Google: Nano Banana 2 (Gemini 3.1 Flash Image) places across the 1 category it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #5 | Image GenerationVision · of 8 ranked | 99 |