Z.ai: GLM 4.5V
GLM 4.5V from Z.ai accepts both text and image inputs, giving it multimodal coverage within a 65,536-token context window. It supports tool use and reasoning, which positions it for agentic workflows and multi-step tasks. Structured output support is unconfirmed, so developers who depend on guaranteed JSON schemas should verify that capability before committing. Maximum output runs to 16,384 tokens per response. At $0.60 per million input tokens and $1.80 per million output tokens, GLM 4.5V sits in the budget-to-mid range for vision-capable models. Its blended benchmark score of 10.0 comes from a single benchmark, which is too narrow a sample to draw reliable conclusions about general performance. Buyers who want a low-cost multimodal option with tool and reasoning support may find it worth testing, but should treat its benchmark standing as largely unproven until broader independent evaluations are available.
- Model ID
- z-ai/glm-4.5v
- Vendor
- z-ai
- Tokenizer
- Other
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no