Z.ai: GLM 4.7 Flash
GLM 4.7 Flash is a text-only model from Z.ai with a large 202K-token context window and support for both tool use and reasoning. It does not accept image or audio input, and structured output support is unconfirmed. The 16K output ceiling is adequate for most generation tasks but worth checking against your workload if you need very long completions. At $0.06 per million input tokens and $0.40 per million output tokens, it sits in budget territory, which is its clearest argument for consideration. The benchmark picture is thin, covering only 2 independent evaluations with a blended score of 46.6, so treat performance claims cautiously until broader coverage exists. Teams running high-volume, text-based pipelines who need tool and reasoning support at low cost may find it worth testing, but those prioritizing proven quality should compare it against models with fuller benchmark records before committing.
- Model ID
- z-ai/glm-4.7-flash
- Vendor
- z-ai
- Tokenizer
- Other
- Input Modalities
- text
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- text only
- Audio
- no
- Moderated
- no
Category rankings
Where Z.ai: GLM 4.7 Flash places across the 2 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #20 | Cheap Bulk InferenceCost · of 25 ranked | 137 |
| #22 | Self-Hosted / LocalCost · of 25 ranked | 117 |