Google: Lyria 3 Clip Preview
Google: Lyria 3 Clip Preview is a text-and-image input model with a 1-million-token context window and a 65,536-token output ceiling. It does not support tool use, reasoning modes, or structured output, so its utility is limited to straightforward generative tasks that fit within those input modalities. The model is currently free, which makes it worth shortlisting for developers and researchers who want to experiment with large-context multimodal prompting at no cost. The significant caveat is that there is no independent benchmark coverage yet, so actual quality relative to paid alternatives is unverified. Users who need proven, measurable performance for production work should treat this as an exploratory option until third-party evaluations become available.
- Model ID
- google/lyria-3-clip-preview
- Vendor
- Tokenizer
- Other
- Input Modalities
- text, image
- Output Modalities
- text, audio
- Max Output
- 65,536 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no