google

Google: Gemma 3 4B

Gemma 3 4B is a text-and-image input model from Google with a 131,072-token context window and a 16,384-token output ceiling. It does not support tool use, reasoning modes, or structured output, so workflows that depend on function calling or guaranteed response schemas will need a different option. At $0.05 per million input tokens and $0.10 per million output tokens, it sits at the budget end of the market. Its blended benchmark score of 20.3 comes from only two benchmarks, which is thin coverage, so treating that figure as a reliable signal of general capability would be premature. Buyers who need a low-cost multimodal model for straightforward text and image tasks, and who can tolerate limited third-party validation, may find it worth testing. Teams with stricter performance requirements or tool-dependent pipelines should compare it carefully against better-documented alternatives before committing.

Quality Score
81/100
price + capability + benchmarks
Input Price
$0.05
per 1M tokens
Output Price
$0.10
per 1M tokens
Context Window
131,072
tokens
Model ID
google/gemma-3-4b-it
Vendor
google
Tokenizer
Gemini
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
not supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models