Median output tokens generated per second.
Output Speed is an operational performance metric, not an intelligence benchmark. Artificial Analysis defines it as the average number of output tokens received per second after the first token arrives, using OpenAI tokens as the standard unit.
Test type: Hosted endpoint performance measurement. Higher values mean faster streaming after response start.
293 models have this metric.
Current leader: Mercury 2
Project links
Values come from Artificial Analysis performance data in the committed snapshot.
Top models ranked by Speed.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Mercury 2 | Inception | 820.2 tok/s | 820.2 tok/s | $0.375/M |
| #2 |
| IBM |
| 410.5 tok/s |
| 410.5 tok/s |
| $0.085/M |
| #3 | Gemini 3.1 Flash-Lite Preview | 332.5 tok/s | 332.5 tok/s | $0.563/M |
| #4 | Nova Micro | Amazon | 332.1 tok/s | 332.1 tok/s | $0.061/M |
| #5 | Sarvam 30B (high) | Sarvam | 306 tok/s | 306 tok/s | - |
| #6 | Ministral 3 3B | Mistral | 287.6 tok/s | 287.6 tok/s | $0.100/M |
| #7 | Qwen3.5 0.8B (Non-reasoning) | Alibaba | 273.6 tok/s | 273.6 tok/s | $0.020/M |
| #8 | gpt-oss-20B (low) | OpenAI | 249.7 tok/s | 249.7 tok/s | $0.108/M |
| #9 | Gemini 2.5 Flash-Lite (Reasoning) | 243.6 tok/s | 243.6 tok/s | $0.175/M |
| #10 | gpt-oss-20B (high) | OpenAI | 242.3 tok/s | 242.3 tok/s | $0.100/M |
| #11 | Gemini 2.5 Flash-Lite (Non-reasoning) | 239.9 tok/s | 239.9 tok/s | $0.175/M |
| #12 | Granite 4.0 H Small | IBM | 238.9 tok/s | 238.9 tok/s | $0.107/M |
| #13 | Qwen3.5 2B (Non-reasoning) | Alibaba | 227 tok/s | 227 tok/s | $0.040/M |
| #14 | gpt-oss-120B (low) | OpenAI | 216.3 tok/s | 216.3 tok/s | $0.263/M |
| #15 | Grok 3 mini Reasoning (high) | xAI | 215.5 tok/s | 215.5 tok/s | $0.350/M |
| #16 | Nova 2.0 Omni (Non-reasoning) | Amazon | 215.2 tok/s | 215.2 tok/s | $0.850/M |
| #17 | Nova 2.0 Lite (Non-reasoning) | Amazon | 214.2 tok/s | 214.2 tok/s | $0.850/M |
| #18 | gpt-oss-120B (high) | OpenAI | 212.3 tok/s | 212.3 tok/s | $0.263/M |
| #19 | GPT-5.1 Codex mini (high) | OpenAI | 207.2 tok/s | 207.2 tok/s | $0.688/M |
| #20 | Ling 2.6 Flash | InclusionAI | 206 tok/s | 206 tok/s | $0.150/M |
| #21 | Qwen3 0.6B (Non-reasoning) | Alibaba | 204.8 tok/s | 204.8 tok/s | $0.188/M |
| #22 | Qwen3.5 4B (Reasoning) | Alibaba | 204.8 tok/s | 204.8 tok/s | $0.060/M |
| #23 | Qwen3.5 4B (Non-reasoning) | Alibaba | 200.2 tok/s | 200.2 tok/s | $0.060/M |
| #24 | Gemini 2.5 Flash (Reasoning) | 199.6 tok/s | 199.6 tok/s | $0.850/M |
| #25 | LFM2 24B A2B | Liquid AI | 196.9 tok/s | 196.9 tok/s | $0.052/M |
| #26 | Qwen3 0.6B (Reasoning) | Alibaba | 195.1 tok/s | 195.1 tok/s | $0.398/M |
| #27 | Devstral Small (Jul '25) | Mistral | 194.2 tok/s | 194.2 tok/s | $0.150/M |
| #28 | Gemini 3 Flash Preview (Reasoning) | 193.2 tok/s | 193.2 tok/s | $1.13/M |
| #29 | Qwen3.6 35B A3B (Reasoning) | Alibaba | 191.8 tok/s | 191.8 tok/s | $0.557/M |
| #30 | Qwen3.5 Omni Flash | Alibaba | 190.4 tok/s | 190.4 tok/s | $0.275/M |
| #31 | Gemini 2.5 Flash (Non-reasoning) | 189.1 tok/s | 189.1 tok/s | $0.850/M |
| #32 | Nova Lite | Amazon | 186.8 tok/s | 186.8 tok/s | $0.105/M |
| #33 | Nova 2.0 Lite (low) | Amazon | 185.6 tok/s | 185.6 tok/s | $0.850/M |
| #34 | Qwen3.6 35B A3B (Non-reasoning) | Alibaba | 185.4 tok/s | 185.4 tok/s | $0.844/M |
| #35 | Jamba 1.6 Mini | AI21 Labs | 184.5 tok/s | 184.5 tok/s | $0.250/M |
| #36 | Gemini 3 Flash Preview (Non-reasoning) | 178.3 tok/s | 178.3 tok/s | $1.13/M |
| #37 | Qwen3 Next 80B A3B (Reasoning) | Alibaba | 172.2 tok/s | 172.2 tok/s | $1.88/M |
| #38 | Nova 2.0 Lite (high) | Amazon | 170.7 tok/s | 170.7 tok/s | $0.850/M |
| #39 | Nova 2.0 Lite (medium) | Amazon | 170.5 tok/s | 170.5 tok/s | $0.850/M |
| #40 | GPT-5 Codex (high) | OpenAI | 166.8 tok/s | 166.8 tok/s | $3.44/M |
| #41 | Llama 3.1 Instruct 8B | Meta | 164.4 tok/s | 164.4 tok/s | $0.100/M |
| #42 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | 163.1 tok/s | 163.1 tok/s | $0.300/M |
| #43 | GPT-5.1 Codex (high) | OpenAI | 162.7 tok/s | 162.7 tok/s | $3.44/M |
| #44 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | 162.5 tok/s | 162.5 tok/s | $0.412/M |
| #45 | GPT-5.4 nano (xhigh) | OpenAI | 160.3 tok/s | 160.3 tok/s | $0.463/M |
| #46 | GPT-5.4 mini (medium) | OpenAI | 159.2 tok/s | 159.2 tok/s | $1.69/M |
| #47 | GPT-5.4 mini (xhigh) | OpenAI | 158.9 tok/s | 158.9 tok/s | $1.69/M |
| #48 | Qwen3 Coder Next | Alibaba | 157.8 tok/s | 157.8 tok/s | $0.600/M |
| #49 | Ministral 3 8B | Mistral | 157.6 tok/s | 157.6 tok/s | $0.150/M |
| #50 | Mistral 7B Instruct | Mistral | 156.9 tok/s | 156.9 tok/s | $0.250/M |
| #51 | Qwen3 Next 80B A3B Instruct | Alibaba | 155.3 tok/s | 155.3 tok/s | $0.875/M |
| #52 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | 154.8 tok/s | 154.8 tok/s | $0.096/M |
| #53 | Mistral Small 3.2 | Mistral | 153.8 tok/s | 153.8 tok/s | $0.150/M |
| #54 | GPT-5.4 nano (medium) | OpenAI | 153.4 tok/s | 153.4 tok/s | $0.463/M |
| #55 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | 153.3 tok/s | 153.3 tok/s | $0.086/M |
| #56 | GPT-5.4 mini (Non-Reasoning) | OpenAI | 152.7 tok/s | 152.7 tok/s | $1.69/M |
| #57 | GPT-5 nano (medium) | OpenAI | 150.3 tok/s | 150.3 tok/s | $0.138/M |
| #58 | GPT-5 (ChatGPT) | OpenAI | 149.8 tok/s | 149.8 tok/s | $3.44/M |
| #59 | Mistral Small 4 (Reasoning) | Mistral | 149.5 tok/s | 149.5 tok/s | $0.263/M |
| #60 | GPT-5.4 nano (Non-Reasoning) | OpenAI | 148.5 tok/s | 148.5 tok/s | $0.463/M |
| #61 | Qwen3 VL 8B Instruct | Alibaba | 143.4 tok/s | 143.4 tok/s | $0.310/M |
| #62 | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 143.2 tok/s | 143.2 tok/s | $0.750/M |
| #63 | Grok 4.1 Fast (Reasoning) | xAI | 140.9 tok/s | 140.9 tok/s | $0.275/M |
| #64 | o3-mini | OpenAI | 140.1 tok/s | 140.1 tok/s | $1.93/M |
| #65 | o3-mini (high) | OpenAI | 140 tok/s | 140 tok/s | $1.93/M |
| #66 | Qwen3.5 122B A10B (Reasoning) | Alibaba | 139.9 tok/s | 139.9 tok/s | $1.10/M |
| #67 | Mistral Small 4 (Non-reasoning) | Mistral | 139.5 tok/s | 139.5 tok/s | $0.263/M |
| #68 | GPT-5 nano (minimal) | OpenAI | 139.1 tok/s | 139.1 tok/s | $0.138/M |
| #69 | Qwen3 1.7B (Non-reasoning) | Alibaba | 139 tok/s | 139 tok/s | $0.188/M |
| #70 | Mistral Small 3.1 | Mistral | 138.8 tok/s | 138.8 tok/s | $0.150/M |
| #71 | Qwen3.5 35B A3B (Reasoning) | Alibaba | 137.7 tok/s | 137.7 tok/s | $0.688/M |
| #72 | Qwen3 1.7B (Reasoning) | Alibaba | 136.7 tok/s | 136.7 tok/s | $0.398/M |
| #73 | GPT-5 nano (high) | OpenAI | 136 tok/s | 136 tok/s | $0.138/M |
| #74 | Mistral Small 3 | Mistral | 135.9 tok/s | 135.9 tok/s | $0.150/M |
| #75 | Mistral Small (Sep '24) | Mistral | 135 tok/s | 135 tok/s | $0.300/M |
| #76 | Granite 4.1 8B | IBM | 134.6 tok/s | 134.6 tok/s | $0.063/M |
| #77 | Qwen3 VL 8B (Reasoning) | Alibaba | 132.7 tok/s | 132.7 tok/s | $0.660/M |
| #78 | Step 3.5 Flash 2603 | StepFun | 132.3 tok/s | 132.3 tok/s | - |
| #79 | Qwen3.5 122B A10B (Non-reasoning) | Alibaba | 131.5 tok/s | 131.5 tok/s | $1.10/M |
| #80 | Gemini 3.1 Pro Preview | 131.2 tok/s | 131.2 tok/s | $4.50/M |