z-ai

Z.ai: GLM 5V Turbo

GLM 5V Turbo is a multimodal model from Z.ai that accepts image, text, and video inputs, giving it broader input coverage than text-only alternatives. Its context window reaches roughly 200K tokens with a maximum completion of 131K tokens, making long-document and extended-conversation workloads feasible. The model supports tool use and reasoning, though structured output support is not confirmed in available documentation. At $1.20 per million input tokens and $4.00 per million output tokens, it sits in a mid-range price tier. Its blended benchmark score of 56.8 is drawn from only one independent benchmark, so performance claims should be treated as preliminary rather than well-established. Teams that specifically need video input processing alongside a large context window may find it worth evaluating, but buyers who prioritize verified, broad benchmark coverage should weigh that limited evidence base carefully before committing.

Query via API → View on z-ai → Estimate cost

Quality Score

100/100

price + capability + benchmarks

Input Price

$1.20

per 1M tokens

Output Price

$4.00

per 1M tokens

Context Window

202,752

tokens

Model ID: z-ai/glm-5v-turbo
Vendor: z-ai
Tokenizer: Other
Input Modalities: image, text, video
Output Modalities: text
Max Output: 131,072 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Category rankings

Where Z.ai: GLM 5V Turbo places across the 1 category it ranks in. How we rank →

#	Category	Score
#21	Video SummarizationVideo · of 25 ranked	139

Similar models

z-ai

Z.ai: GLM 5V Turbo

Category rankings

Similar models

Z.ai: GLM 4.6V

Z.ai: GLM 4.7 Flash

Z.ai: GLM 4.7

Z.ai: GLM 4.6

Z.ai: GLM 5

Z.ai: GLM 5.2