z-ai

Z.ai: GLM 4.5V

GLM 4.5V from Z.ai accepts both text and image inputs, giving it multimodal coverage within a 65,536-token context window. It supports tool use and reasoning, which positions it for agentic workflows and multi-step tasks. Structured output support is unconfirmed, so developers who depend on guaranteed JSON schemas should verify that capability before committing. Maximum output runs to 16,384 tokens per response. At $0.60 per million input tokens and $1.80 per million output tokens, GLM 4.5V sits in the budget-to-mid range for vision-capable models. Its blended benchmark score of 10.0 comes from a single benchmark, which is too narrow a sample to draw reliable conclusions about general performance. Buyers who want a low-cost multimodal option with tool and reasoning support may find it worth testing, but should treat its benchmark standing as largely unproven until broader independent evaluations are available.

Quality Score
87/100
price + capability + benchmarks
Input Price
$0.60
per 1M tokens
Output Price
$1.80
per 1M tokens
Context Window
65,536
tokens
Model ID
z-ai/glm-4.5v
Vendor
z-ai
Tokenizer
Other
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models