google

Google: Gemma 3 12B

Gemma 3 12B is a Google model that accepts both text and image inputs, making it usable for multimodal tasks without requiring a separate vision model. It supports a 131K-token context window, which is sufficient for long documents or extended conversations, and it supports tool use. It does not offer native reasoning mode, and structured output support is unconfirmed based on available data. At $0.05 per million input tokens and $0.15 per million output tokens, Gemma 3 12B sits at the budget end of the pricing spectrum. Its blended benchmark score of 3.9 comes from a single benchmark, so performance claims should be treated as preliminary rather than well-established. Developers running high-volume, cost-sensitive workloads who also need image understanding may find it worth testing, but buyers who require strong benchmark validation before committing should wait for broader coverage.

Query via API → View on google → Estimate cost

Quality Score

91/100

price + capability + benchmarks

Input Price

$0.05

per 1M tokens

Output Price

$0.15

per 1M tokens

Context Window

131,072

tokens

Model ID: google/gemma-3-12b-it
Vendor: google
Tokenizer: Gemini
Input Modalities: text, image
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

google

Google: Gemma 3 12B

Similar models

Google: Gemma 3 27B

Google: Nano Banana Pro (Gemini 3 Pro Image)

Google: Lyria 3 Pro Preview

Google: Lyria 3 Clip Preview

Google: Nano Banana 2 (Gemini 3.1 Flash Image)

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)