Google: Gemma 4 31B (free)
Gemma 4 31B is a free multimodal model from Google that accepts text, images, and video as input. Its context window runs to 262,144 tokens, which accommodates long documents or extended conversations without truncation. Responses are capped at 8,192 output tokens. The model supports tool use and reasoning, making it usable for agentic workflows and multi-step tasks, though structured output support is unconfirmed. For comparison purposes, the zero cost makes Gemma 4 31B worth shortlisting for developers prototyping multimodal or tool-integrated applications on a tight budget. The significant caveat is that it currently has no independent benchmark coverage, so its quality relative to paid alternatives is unverified. Users willing to evaluate it against their own tasks will pay nothing to find out, but those who need a reliable quality baseline before committing should wait for third-party benchmark data to emerge.
- Model ID
- google/gemma-4-31b-it:free
- Vendor
- Tokenizer
- Gemma
- Input Modalities
- image, text, video
- Output Modalities
- text
- Max Output
- 8,192 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- yes
Strong choice for
Social Media Posts
Voice Assistant Backend
Cheap Bulk Inference
Self-Hosted / Local
Real-Time Chat
Category rankings
Where Google: Gemma 4 31B (free) places across the 6 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #3 | Social Media PostsWriting · of 25 ranked | 120 |
| #3 | Voice Assistant BackendVoice · of 25 ranked | 124 |
| #3 | Cheap Bulk InferenceCost · of 25 ranked | 138 |
| #3 | Self-Hosted / LocalCost · of 25 ranked | 118 |
| #5 | Real-Time ChatLatency · of 25 ranked | 118 |
| #12 | Video Auto-TaggingVideo · of 25 ranked | 123 |