qwen

Qwen: Qwen3.6 Flash

Qwen3.6 Flash is a multimodal model from Qwen that accepts text, image, and video inputs and supports tool use and reasoning. Its context window reaches 1,000,000 tokens, and maximum output runs to 65,536 tokens per response. Structured output support is unconfirmed. That combination of long context, visual input, and reasoning puts it in a practical range for document-heavy or media-rich workflows where tool integration matters. On the comparison front, the pricing is low relative to many multimodal reasoning models, at $0.1875 per million input tokens and $1.125 per million output tokens, which makes volume use more affordable. The significant caveat is that Qwen3.6 Flash carries no independent benchmark coverage at this time, so quality relative to competing models is unverified. Buyers who are cost-sensitive and willing to run their own evaluations have a reasonable case to shortlist it; those who need validated performance data before committing should wait for third-party results.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.19
per 1M tokens
Output Price
$1.12
per 1M tokens
Context Window
1,000,000
tokens
Model ID
qwen/qwen3.6-flash
Vendor
qwen
Tokenizer
Qwen3
Input Modalities
text, image, video
Output Modalities
text
Max Output
65,536 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Category rankings

Where Qwen: Qwen3.6 Flash places across the 3 categories it ranks in. How we rank →

#CategoryScore
#6 Video Auto-TaggingVideo · of 25 ranked 123
#16 Real-Time ChatLatency · of 25 ranked 117
#25 Video SummarizationVideo · of 25 ranked 139

Similar models