Z.ai: GLM 4.5V
GLM 4.5V is a multimodal model from Z.ai that accepts both text and image inputs, making it applicable to tasks that require visual understanding alongside language processing. It supports a 65,536-token context window, up to 16,384 completion tokens, and includes tool use and reasoning capabilities. Structured output support is not confirmed, which is worth noting if your workflow depends on reliable JSON or schema-constrained responses. At $0.60 per million input tokens and $1.80 per million output tokens, GLM 4.5V sits in the budget-to-mid tier of multimodal models. However, its benchmark standing is difficult to assess with confidence; the blended score of 10.0 comes from just one benchmark, giving very limited independent coverage. Buyers who need a low-cost model with image input and tool support may find it worth testing, but those requiring well-validated performance across diverse tasks should treat its benchmarks as preliminary until broader evaluations are available.
- Model ID
- z-ai/glm-4.5v
- Vendor
- z-ai
- Tokenizer
- Other
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no