google

Google: Lyria 3 Clip Preview

Google: Lyria 3 Clip Preview is a text-and-image input model with a 1-million-token context window and a 65,536-token output ceiling. It does not support tool use, reasoning modes, or structured output, so its utility is limited to straightforward generative tasks that fit within those input modalities. The model is currently free, which makes it worth shortlisting for developers and researchers who want to experiment with large-context multimodal prompting at no cost. The significant caveat is that there is no independent benchmark coverage yet, so actual quality relative to paid alternatives is unverified. Users who need proven, measurable performance for production work should treat this as an exploratory option until third-party evaluations become available.

Query via API → View on google → Estimate cost

Quality Score

88/100

price + capability + benchmarks

Input Price

Free

per 1M tokens

Output Price

Free

per 1M tokens

Context Window

1,048,576

tokens

Model ID: google/lyria-3-clip-preview
Vendor: google
Tokenizer: Other
Input Modalities: text, image
Output Modalities: text, audio
Max Output: 65,536 tokens
Tool Calling: not supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

google

Google: Lyria 3 Clip Preview

Similar models

Google: Lyria 3 Pro Preview

Google: Nano Banana Pro (Gemini 3 Pro Image)

Google: Gemma 3 12B

Google: Gemma 3 27B

Google: Nano Banana 2 (Gemini 3.1 Flash Image)

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)