Qwen: Qwen3.6 Flash
Qwen3.6 Flash is a multimodal model from Qwen that accepts text, image, and video inputs and supports tool use and reasoning. Its context window reaches 1,000,000 tokens, and maximum output runs to 65,536 tokens per response. Structured output support is unconfirmed. That combination of long context, visual input, and reasoning puts it in a practical range for document-heavy or media-rich workflows where tool integration matters. On the comparison front, the pricing is low relative to many multimodal reasoning models, at $0.1875 per million input tokens and $1.125 per million output tokens, which makes volume use more affordable. The significant caveat is that Qwen3.6 Flash carries no independent benchmark coverage at this time, so quality relative to competing models is unverified. Buyers who are cost-sensitive and willing to run their own evaluations have a reasonable case to shortlist it; those who need validated performance data before committing should wait for third-party results.
- Model ID
- qwen/qwen3.6-flash
- Vendor
- qwen
- Tokenizer
- Qwen3
- Input Modalities
- text, image, video
- Output Modalities
- text
- Max Output
- 65,536 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Category rankings
Where Qwen: Qwen3.6 Flash places across the 3 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #6 | Video Auto-TaggingVideo · of 25 ranked | 123 |
| #16 | Real-Time ChatLatency · of 25 ranked | 117 |
| #25 | Video SummarizationVideo · of 25 ranked | 139 |