Median output tokens generated per second.
Output Speed is an operational performance metric, not an intelligence benchmark. Artificial Analysis defines it as the average number of output tokens received per second after the first token arrives, using OpenAI tokens as the standard unit.
Test type: Hosted endpoint performance measurement. Higher values mean faster streaming after response start.
300 models have this metric.
Current leader: Mercury 2
Project links
Values come from Artificial Analysis performance data in the committed snapshot.
Top models ranked by Speed.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Mercury 2 | Inception | 758.7 tok/s | 758.7 tok/s | $0.375/M |
| #2 |
| Liquid AI |
| 540.5 tok/s |
| 540.5 tok/s |
| - |
| #3 | LFM2.5-VL-1.6B | Liquid AI | 507.3 tok/s | 507.3 tok/s | - |
| #4 | Granite 4.0 H Small | IBM | 454.2 tok/s | 454.2 tok/s | $0.107/M |
| #5 | Granite 3.3 8B (Non-reasoning) | IBM | 400.9 tok/s | 400.9 tok/s | $0.085/M |
| #6 | Step 3.7 Flash | StepFun | 385.5 tok/s | 385.5 tok/s | $0.438/M |
| #7 | gpt-oss-120b (low) | OpenAI | 363.9 tok/s | 363.9 tok/s | $0.262/M |
| #8 | gpt-oss-120b (high) | OpenAI | 348.5 tok/s | 348.5 tok/s | $0.262/M |
| #9 | LFM2 2.6B | Liquid AI | 344.8 tok/s | 344.8 tok/s | - |
| #10 | Qwen3.5 2B (Non-reasoning) | Alibaba | 318.9 tok/s | 318.9 tok/s | $0.040/M |
| #11 | Llama 3.1 Nemotron Instruct 70B | NVIDIA | 290.5 tok/s | 290.5 tok/s | $1.20/M |
| #12 | Nova Micro | Amazon | 284.2 tok/s | 284.2 tok/s | $0.061/M |
| #13 | Gemini 3.1 Flash-Lite | 281.5 tok/s | 281.5 tok/s | $0.563/M |
| #14 | Nemotron 3 Nano Omni 30B A3B Reasoning | NVIDIA | 276.7 tok/s | 276.7 tok/s | $0.131/M |
| #15 | Gemini 2.5 Flash-Lite (Reasoning) | 265.2 tok/s | 265.2 tok/s | $0.175/M |
| #16 | gpt-oss-20B (high) | OpenAI | 240 tok/s | 240 tok/s | $0.088/M |
| #17 | Nova 2.0 Lite (Non-reasoning) | Amazon | 235.3 tok/s | 235.3 tok/s | $0.850/M |
| #18 | Step 3.5 Flash 2603 | StepFun | 231 tok/s | 231 tok/s | $0.150/M |
| #19 | Gemini 2.5 Flash-Lite (Non-reasoning) | 229.5 tok/s | 229.5 tok/s | $0.175/M |
| #20 | LFM2.5-8B-A1B | Liquid AI | 229.3 tok/s | 229.3 tok/s | - |
| #21 | Qwen3.5 Omni Flash | Alibaba | 224.4 tok/s | 224.4 tok/s | $0.275/M |
| #22 | gpt-oss-20B (low) | OpenAI | 224.2 tok/s | 224.2 tok/s | $0.095/M |
| #23 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | 223.7 tok/s | 223.7 tok/s | $0.300/M |
| #24 | Gemini 2.5 Flash (Reasoning) | 221.3 tok/s | 221.3 tok/s | $0.850/M |
| #25 | o3-mini (high) | OpenAI | 218.5 tok/s | 218.5 tok/s | $1.93/M |
| #26 | Step 3.5 Flash | StepFun | 217.5 tok/s | 217.5 tok/s | $0.150/M |
| #27 | GPT-5.1 Codex mini (high) | OpenAI | 213.6 tok/s | 213.6 tok/s | $0.688/M |
| #28 | Command A+ | Cohere | 211.8 tok/s | 211.8 tok/s | - |
| #29 | Gemini 3.5 Flash (medium) | 210.1 tok/s | 210.1 tok/s | $3.38/M |
| #30 | Qwen3.5 4B (Non-reasoning) | Alibaba | 208.9 tok/s | 208.9 tok/s | $0.060/M |
| #31 | o3-mini | OpenAI | 203.3 tok/s | 203.3 tok/s | $1.93/M |
| #32 | Gemini 3.5 Flash (high) | 203.3 tok/s | 203.3 tok/s | $3.38/M |
| #33 | MiniMax-M2.5 | MiniMax | 202.9 tok/s | 202.9 tok/s | $0.525/M |
| #34 | Gemini 3.5 Flash (minimal) | 202.7 tok/s | 202.7 tok/s | $3.38/M |
| #35 | Llama 3.1 Instruct 8B | Meta | 201.5 tok/s | 201.5 tok/s | $0.100/M |
| #36 | Qwen3.5 4B (Reasoning) | Alibaba | 195.8 tok/s | 195.8 tok/s | $0.060/M |
| #37 | Nova Lite | Amazon | 191.7 tok/s | 191.7 tok/s | $0.105/M |
| #38 | Nova 2.0 Lite (medium) | Amazon | 190.6 tok/s | 190.6 tok/s | $0.850/M |
| #39 | Qwen3.7 Max | Alibaba | 186.5 tok/s | 186.5 tok/s | $3.75/M |
| #40 | Jamba 1.6 Mini | AI21 Labs | 185.9 tok/s | 185.9 tok/s | $0.250/M |
| #41 | Gemini 2.5 Flash (Non-reasoning) | 185.1 tok/s | 185.1 tok/s | $0.850/M |
| #42 | MiniMax-M2.1 | MiniMax | 184.6 tok/s | 184.6 tok/s | $0.525/M |
| #43 | GPT-5.1 Codex (high) | OpenAI | 182.1 tok/s | 182.1 tok/s | $3.44/M |
| #44 | Gemini 3 Flash Preview (Non-reasoning) | 181.3 tok/s | 181.3 tok/s | $1.13/M |
| #45 | GPT-5.4 mini (xhigh) | OpenAI | 178.8 tok/s | 178.8 tok/s | $1.69/M |
| #46 | GPT-5.4 mini (medium) | OpenAI | 177.9 tok/s | 177.9 tok/s | $1.69/M |
| #47 | Nova 2.0 Lite (high) | Amazon | 174.7 tok/s | 174.7 tok/s | $0.850/M |
| #48 | Ministral 3 3B | Mistral | 174.3 tok/s | 174.3 tok/s | $0.100/M |
| #49 | Gemini 3 Flash Preview (Reasoning) | 172.8 tok/s | 172.8 tok/s | $1.13/M |
| #50 | GPT-5 Codex (high) | OpenAI | 171.1 tok/s | 171.1 tok/s | $3.44/M |
| #51 | Grok 4.20 0309 v2 (Reasoning) | xAI | 168.7 tok/s | 168.7 tok/s | $3.00/M |
| #52 | Nova 2.0 Lite (low) | Amazon | 167.6 tok/s | 167.6 tok/s | $0.850/M |
| #53 | GPT-5 (ChatGPT) | OpenAI | 167.3 tok/s | 167.3 tok/s | $3.44/M |
| #54 | Mistral Small 4 (Reasoning) | Mistral | 167.3 tok/s | 167.3 tok/s | $0.262/M |
| #55 | GPT-5 nano (medium) | OpenAI | 167 tok/s | 167 tok/s | $0.138/M |
| #56 | Grok 4.20 0309 (Reasoning) | xAI | 166.5 tok/s | 166.5 tok/s | $3.00/M |
| #57 | GPT-5.4 mini (Non-Reasoning) | OpenAI | 165.1 tok/s | 165.1 tok/s | $1.69/M |
| #58 | Mistral Small 4 (Non-reasoning) | Mistral | 164.1 tok/s | 164.1 tok/s | $0.262/M |
| #59 | Trinity Large Thinking | Arcee AI | 162.3 tok/s | 162.3 tok/s | $0.395/M |
| #60 | Grok 4.20 0309 v2 (Non-reasoning) | xAI | 160.7 tok/s | 160.7 tok/s | $3.00/M |
| #61 | Qwen3.6 35B A3B (Reasoning) | Alibaba | 159.9 tok/s | 159.9 tok/s | $0.557/M |
| #62 | Grok 4.3 (high) | xAI | 159.7 tok/s | 159.7 tok/s | $1.56/M |
| #63 | Qwen3.6 35B A3B (Non-reasoning) | Alibaba | 159.4 tok/s | 159.4 tok/s | $0.844/M |
| #64 | Grok 4.20 0309 (Non-reasoning) | xAI | 158.9 tok/s | 158.9 tok/s | $3.00/M |
| #65 | Gemma 4 12B (Reasoning) | 158.6 tok/s | 158.6 tok/s | $0.150/M |
| #66 | Mistral Small 3.1 | Mistral | 158.2 tok/s | 158.2 tok/s | $0.138/M |
| #67 | Mistral Small (Feb '24) | Mistral | 157.3 tok/s | 157.3 tok/s | $1.50/M |
| #68 | GPT-5.4 nano (Non-Reasoning) | OpenAI | 155.2 tok/s | 155.2 tok/s | $0.463/M |
| #69 | GPT-5 nano (minimal) | OpenAI | 153.9 tok/s | 153.9 tok/s | $0.138/M |
| #70 | Mistral Small 3 | Mistral | 153.7 tok/s | 153.7 tok/s | $0.104/M |
| #71 | Qwen3.5 122B A10B (Non-reasoning) | Alibaba | 152.5 tok/s | 152.5 tok/s | $1.10/M |
| #72 | Mistral Small (Sep '24) | Mistral | 151.5 tok/s | 151.5 tok/s | $0.300/M |
| #73 | o4-mini (high) | OpenAI | 151 tok/s | 151 tok/s | $1.93/M |
| #74 | GPT-5 nano (high) | OpenAI | 150.4 tok/s | 150.4 tok/s | $0.138/M |
| #75 | GPT-5.4 nano (medium) | OpenAI | 149.7 tok/s | 149.7 tok/s | $0.463/M |
| #76 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | 149.6 tok/s | 149.6 tok/s | $0.412/M |
| #77 | Grok 4.3 (low) | xAI | 148.4 tok/s | 148.4 tok/s | $1.56/M |
| #78 | Claude 4.5 Haiku (Reasoning) | Anthropic | 148.3 tok/s | 148.3 tok/s | $2.00/M |
| #79 | Sarvam 30B (high) | Sarvam | 147.9 tok/s | 147.9 tok/s | $0.047/M |
| #80 | Nova 2.0 Pro Preview (low) | Amazon | 147.9 tok/s | 147.9 tok/s | $3.44/M |