Z.ai: GLM 5V Turbo
GLM 5V Turbo is a multimodal model from Z.ai that accepts image, text, and video inputs, giving it broader input coverage than text-only alternatives. Its context window reaches roughly 200K tokens with a maximum completion of 131K tokens, making long-document and extended-conversation workloads feasible. The model supports tool use and reasoning, though structured output support is not confirmed in available documentation. At $1.20 per million input tokens and $4.00 per million output tokens, it sits in a mid-range price tier. Its blended benchmark score of 56.8 is drawn from only one independent benchmark, so performance claims should be treated as preliminary rather than well-established. Teams that specifically need video input processing alongside a large context window may find it worth evaluating, but buyers who prioritize verified, broad benchmark coverage should weigh that limited evidence base carefully before committing.
- Model ID
- z-ai/glm-5v-turbo
- Vendor
- z-ai
- Tokenizer
- Other
- Input Modalities
- image, text, video
- Output Modalities
- text
- Max Output
- 131,072 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Category rankings
Where Z.ai: GLM 5V Turbo places across the 1 category it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #21 | Video SummarizationVideo · of 25 ranked | 139 |