z-ai

Z.ai: GLM 4.5V

GLM 4.5V is a multimodal model from Z.ai that accepts both text and image inputs, making it applicable to tasks that require visual understanding alongside language processing. It supports a 65,536-token context window, up to 16,384 completion tokens, and includes tool use and reasoning capabilities. Structured output support is not confirmed, which is worth noting if your workflow depends on reliable JSON or schema-constrained responses. At $0.60 per million input tokens and $1.80 per million output tokens, GLM 4.5V sits in the budget-to-mid tier of multimodal models. However, its benchmark standing is difficult to assess with confidence; the blended score of 10.0 comes from just one benchmark, giving very limited independent coverage. Buyers who need a low-cost model with image input and tool support may find it worth testing, but those requiring well-validated performance across diverse tasks should treat its benchmarks as preliminary until broader evaluations are available.

Quality Score
87/100
price + capability + benchmarks
Input Price
$0.60
per 1M tokens
Output Price
$1.80
per 1M tokens
Context Window
65,536
tokens
Model ID
z-ai/glm-4.5v
Vendor
z-ai
Tokenizer
Other
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models