Google: Nano Banana (Gemini 2.5 Flash Image)
Google Nano Banana (Gemini 2.5 Flash Image) is a multimodal model from Google that accepts both image and text as input and returns text output. It works within a 32,768-token context window and supports up to 8,192 completion tokens. The model does not support tool use, reasoning modes, or structured output, so workflows that depend on any of those capabilities will need a different option. At $0.30 per million input tokens and $2.50 per million output tokens, it sits at a low-to-mid price point for image-capable models, which makes it worth considering for teams running image-plus-text tasks at volume on a budget. The comparison caveat is significant, though: there is no independent benchmark coverage available, so its quality relative to competing models is currently unproven. Buyers who need validated performance data before committing should wait for third-party evaluations or run their own representative tests first.
- Model ID
- google/gemini-2.5-flash-image
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- image, text
- Output Modalities
- image, text
- Max Output
- 8,192 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Category rankings
Where Google: Nano Banana (Gemini 2.5 Flash Image) places across the 1 category it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #8 | Image GenerationVision · of 8 ranked | 82 |