Latency · best for
Best AI model for Real-Time Chat (2026)
Models tuned for sub-second response. Ranked from 346 live models on the OpenRouter catalog, weighted for low latency, low cost.
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free | 118 | Free | Free | 262,144 | Try → |
| 2 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 118 | $0.07 | $0.34 | 262,144 | Try → |
| 3 | Google: Gemma 4 31B (free)google/gemma-4-31b-it:free | 118 | Free | Free | 262,144 | Try → |
| 4 | Qwen: Qwen3.5-9Bqwen/qwen3.5-9b | 118 | $0.10 | $0.15 | 262,144 | Try → |
| 5 | ByteDance Seed: Seed-2.0-Minibytedance-seed/seed-2.0-mini | 118 | $0.10 | $0.40 | 262,144 | Try → |
| 6 | Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23 | 118 | $0.07 | $0.26 | 1,000,000 | Try → |
| 7 | ByteDance Seed: Seed 1.6 Flashbytedance-seed/seed-1.6-flash | 118 | $0.07 | $0.30 | 262,144 | Try → |
| 8 | Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025 | 118 | $0.10 | $0.40 | 1,048,576 | Try → |
| 9 | OpenAI: GPT-5 Nanoopenai/gpt-5-nano | 118 | $0.05 | $0.40 | 400,000 | Try → |
| 10 | Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite | 118 | $0.10 | $0.40 | 1,048,576 | Try → |
| 11 | OpenAI: GPT-4.1 Nanoopenai/gpt-4.1-nano | 118 | $0.10 | $0.40 | 1,047,576 | Try → |
| 12 | Google: Gemini 2.0 Flash Litegoogle/gemini-2.0-flash-lite-001 | 118 | $0.07 | $0.30 | 1,048,576 | Try → |
| 13 | Google: Gemini 2.0 Flashgoogle/gemini-2.0-flash-001 | 118 | $0.10 | $0.40 | 1,048,576 | Try → |
| 14 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 117 | $0.13 | $0.38 | 262,144 | Try → |
| 15 | OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano | 117 | $0.20 | $1.25 | 400,000 | Try → |
How we ranked these
For Real-Time Chat, we weight models on low latency, low cost. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →