z-ai

Z.ai: GLM 4.7 Flash

GLM 4.7 Flash is a text-only model from Z.ai with a large 202K-token context window and support for both tool use and reasoning. It does not accept image or audio input, and structured output support is unconfirmed. The 16K output ceiling is adequate for most generation tasks but worth checking against your workload if you need very long completions. At $0.06 per million input tokens and $0.40 per million output tokens, it sits in budget territory, which is its clearest argument for consideration. The benchmark picture is thin, covering only 2 independent evaluations with a blended score of 46.6, so treat performance claims cautiously until broader coverage exists. Teams running high-volume, text-based pipelines who need tool and reasoning support at low cost may find it worth testing, but those prioritizing proven quality should compare it against models with fuller benchmark records before committing.

Quality Score
99/100
price + capability + benchmarks
Input Price
$0.06
per 1M tokens
Output Price
$0.40
per 1M tokens
Context Window
202,752
tokens
Model ID
z-ai/glm-4.7-flash
Vendor
z-ai
Tokenizer
Other
Input Modalities
text
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
text only
Audio
no
Moderated
no

Category rankings

Where Z.ai: GLM 4.7 Flash places across the 2 categories it ranks in. How we rank →

#CategoryScore
#20 Cheap Bulk InferenceCost · of 25 ranked 137
#22 Self-Hosted / LocalCost · of 25 ranked 117

Similar models