z-ai

Z.ai: GLM 4.6V

GLM 4.6V is a multimodal model from Z.ai that accepts text, images, and video as input, with a 131,072-token context window and a maximum of 32,768 output tokens. It supports tool use and reasoning, which makes it capable of agentic and multi-step workflows. Structured output support is unconfirmed, so developers who depend on guaranteed JSON schemas should verify that independently before committing. At $0.30 per million input tokens and $0.90 per million output tokens, the pricing is competitive for a model handling video alongside text and images. However, its blended benchmark score of 16.8 across only one independent benchmark offers a limited basis for quality comparison, so performance claims should be treated as provisional. Teams processing multimodal content on a moderate budget may find it worth evaluating, but those prioritizing well-documented quality should wait for broader benchmark coverage before relying on it for critical workloads.

Query via API → View on z-ai → Estimate cost

Quality Score

100/100

price + capability + benchmarks

Input Price

$0.30

per 1M tokens

Output Price

$0.90

per 1M tokens

Context Window

131,072

tokens

Model ID: z-ai/glm-4.6v
Vendor: z-ai
Tokenizer: Other
Input Modalities: image, text, video
Output Modalities: text
Max Output: 32,768 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Category rankings

Where Z.ai: GLM 4.6V places across the 3 categories it ranks in. How we rank →

#	Category	Score
#20	Social Media PostsWriting · of 25 ranked	119
#20	Voice Assistant BackendVoice · of 25 ranked	123
#21	Real-Time ChatLatency · of 25 ranked	117

Similar models

z-ai

Z.ai: GLM 4.6V

Category rankings

Similar models

Z.ai: GLM 5V Turbo

Z.ai: GLM 4.7 Flash

Z.ai: GLM 4.7

Z.ai: GLM 4.6

Z.ai: GLM 5

Z.ai: GLM 5.2