z-ai

Z.ai: GLM 4.7 Flash

GLM 4.7 Flash is a text-only model from Z.ai with a large 202K-token context window and support for both tool use and reasoning. It does not accept image or audio input, and structured output support is unconfirmed. The 16K output ceiling is adequate for most generation tasks but worth checking against your workload if you need very long completions. At $0.06 per million input tokens and $0.40 per million output tokens, it sits in budget territory, which is its clearest argument for consideration. The benchmark picture is thin, covering only 2 independent evaluations with a blended score of 46.6, so treat performance claims cautiously until broader coverage exists. Teams running high-volume, text-based pipelines who need tool and reasoning support at low cost may find it worth testing, but those prioritizing proven quality should compare it against models with fuller benchmark records before committing.

Query via API → View on z-ai → Estimate cost

Quality Score

99/100

price + capability + benchmarks

Input Price

$0.06

per 1M tokens

Output Price

$0.40

per 1M tokens

Context Window

202,752

tokens

Model ID: z-ai/glm-4.7-flash
Vendor: z-ai
Tokenizer: Other
Input Modalities: text
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: text only
Audio: no
Moderated: no

Category rankings

Where Z.ai: GLM 4.7 Flash places across the 2 categories it ranks in. How we rank →

#	Category	Score
#20	Cheap Bulk InferenceCost · of 25 ranked	137
#22	Self-Hosted / LocalCost · of 25 ranked	117

Similar models

z-ai

Z.ai: GLM 4.7 Flash

Category rankings

Similar models

Z.ai: GLM 5V Turbo

Z.ai: GLM 4.6V

Z.ai: GLM 4.7

Z.ai: GLM 4.6

Z.ai: GLM 5

Z.ai: GLM 5.2