z-ai

Z.ai: GLM 4.5V

GLM 4.5V is a multimodal model from Z.ai that accepts both text and image inputs, making it applicable to tasks that require visual understanding alongside language processing. It supports a 65,536-token context window, up to 16,384 completion tokens, and includes tool use and reasoning capabilities. Structured output support is not confirmed, which is worth noting if your workflow depends on reliable JSON or schema-constrained responses. At $0.60 per million input tokens and $1.80 per million output tokens, GLM 4.5V sits in the budget-to-mid tier of multimodal models. However, its benchmark standing is difficult to assess with confidence; the blended score of 10.0 comes from just one benchmark, giving very limited independent coverage. Buyers who need a low-cost model with image input and tool support may find it worth testing, but those requiring well-validated performance across diverse tasks should treat its benchmarks as preliminary until broader evaluations are available.

Query via API → View on z-ai → Estimate cost

Quality Score

87/100

price + capability + benchmarks

Input Price

$0.60

per 1M tokens

Output Price

$1.80

per 1M tokens

Context Window

65,536

tokens

Model ID: z-ai/glm-4.5v
Vendor: z-ai
Tokenizer: Other
Input Modalities: text, image
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

z-ai

Z.ai: GLM 4.5V

Similar models

Z.ai: GLM 4.5 Air

Z.ai: GLM 4.5

Z.ai: GLM 5 Turbo

Z.ai: GLM 5.2

Z.ai: GLM 5.1

Z.ai: GLM 5