Runtime leaderboard using output speed and latency.
Domain score averages relative percentile across included metrics.
Domain score 99
Runtime leaderboard using output speed and latency.
| Rank | Model | Creator | Domain Score | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Qwen3.5 2B (Non-reasoning) | Alibaba | 98.8 | 318.9 tok/s | $0.040/M |
| #2 | Llama 3.1 Nemotron Instruct 70B | NVIDIA | 97.8 | 290.5 tok/s | $1.20/M |
| #3 |
| Cohere |
| 95.9 |
| 211.8 tok/s |
| - |
| #4 | Gemini 2.5 Flash-Lite (Non-reasoning) | 94.9 | 229.5 tok/s | $0.175/M |
| #5 | Qwen3.5 4B (Non-reasoning) | Alibaba | 94.7 | 208.9 tok/s | $0.060/M |
| #6 | gpt-oss-20B (high) | OpenAI | 94.2 | 272.3 tok/s | $0.088/M |
| #7 | Qwen3.5 4B (Reasoning) | Alibaba | 94.0 | 195.8 tok/s | $0.060/M |
| #8 | gpt-oss-20B (low) | OpenAI | 93.4 | 274.1 tok/s | $0.095/M |
| #9 | gpt-oss-120b (low) | OpenAI | 92.7 | 373.3 tok/s | $0.262/M |
| #10 | gpt-oss-120b (high) | OpenAI | 91.0 | 358.8 tok/s | $0.262/M |
| #11 | Llama 3.1 Instruct 8B | Meta | 87.8 | 201.5 tok/s | $0.100/M |
| #12 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | 87.8 | 223.7 tok/s | $0.300/M |
| #13 | Nemotron 3 Nano Omni 30B A3B Reasoning | NVIDIA | 87.4 | 276.7 tok/s | $0.131/M |
| #14 | Ministral 3 3B | Mistral | 86.1 | 155.6 tok/s | $0.100/M |
| #15 | Mistral Small 4 (Non-reasoning) | Mistral | 85.7 | 171.9 tok/s | $0.262/M |
| #16 | Nova Micro | Amazon | 85.4 | 302 tok/s | $0.061/M |
| #17 | GPT-5.4 mini (Non-Reasoning) | OpenAI | 84.4 | 173.9 tok/s | $1.69/M |
| #18 | Mistral Small 3.1 | Mistral | 84.2 | 158.2 tok/s | $0.138/M |
| #19 | Step 3.7 Flash | StepFun | 84.0 | 385.5 tok/s | $0.438/M |
| #20 | Gemini 2.5 Flash (Non-reasoning) | 83.8 | 185.1 tok/s | $0.850/M |
| #21 | Mistral Small (Feb '24) | Mistral | 83.7 | 157.3 tok/s | $1.50/M |
| #22 | Mistral Small (Sep '24) | Mistral | 83.0 | 151.5 tok/s | $0.300/M |
| #23 | Mistral Small 3 | Mistral | 82.8 | 153.7 tok/s | $0.104/M |
| #24 | GPT-5 (ChatGPT) | OpenAI | 82.5 | 167.3 tok/s | $3.44/M |
| #25 | Mistral Small 4 (Reasoning) | Mistral | 82.0 | 183.5 tok/s | $0.262/M |
| #26 | Grok 4.3 (Non-reasoning) | xAI | 81.8 | 173.6 tok/s | $1.56/M |
| #27 | Granite 4.1 8B | IBM | 81.3 | 134.2 tok/s | $0.063/M |
| #28 | Nova Lite | Amazon | 81.1 | 191.7 tok/s | $0.105/M |
| #29 | Tiny Aya Global | Cohere | 80.4 | 123.9 tok/s | - |
| #30 | Step 3.5 Flash 2603 | StepFun | 80.3 | 231 tok/s | $0.150/M |
| #31 | Mistral Small 3.2 | Mistral | 80.1 | 127.1 tok/s | $0.128/M |
| #32 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | 79.8 | 118 tok/s | $0.070/M |
| #33 | Step 3.5 Flash | StepFun | 79.1 | 217.5 tok/s | $0.150/M |
| #34 | LFM2 24B A2B | Liquid AI | 78.6 | 118.3 tok/s | $0.052/M |
| #35 | GPT-3.5 Turbo | OpenAI | 78.4 | 132.3 tok/s | $0.750/M |
| #36 | Jamba 1.6 Mini | AI21 Labs | 77.6 | 185.9 tok/s | $0.250/M |
| #37 | Ministral 3 8B | Mistral | 77.6 | 119.9 tok/s | $0.150/M |
| #38 | Gemini 3.5 Flash (minimal) | 77.2 | 199.1 tok/s | $3.38/M |
| #39 | GPT-5.4 nano (Non-Reasoning) | OpenAI | 77.2 | 157.9 tok/s | $0.463/M |
| #40 | Mistral 7B Instruct | Mistral | 76.2 | 110.4 tok/s | $0.206/M |
| #41 | Nova 2.0 Lite (Non-reasoning) | Amazon | 75.7 | 202.2 tok/s | $0.850/M |
| #42 | Grok 4.20 0309 v2 (Non-reasoning) | xAI | 75.5 | 160.7 tok/s | $3.00/M |
| #43 | Magistral Small 1.2 | Mistral | 75.5 | 111.6 tok/s | $0.750/M |
| #44 | Trinity Large Thinking | Arcee AI | 75.3 | 162.3 tok/s | $0.395/M |
| #45 | Grok 4.20 0309 (Non-reasoning) | xAI | 75.3 | 158.9 tok/s | $3.00/M |
| #46 | GPT-5 nano (minimal) | OpenAI | 74.7 | 153.9 tok/s | $0.138/M |
| #47 | GPT-4o (Nov '24) | OpenAI | 74.3 | 142.1 tok/s | $4.38/M |
| #48 | Nova 2.0 Pro Preview (Non-reasoning) | Amazon | 74.3 | 159.7 tok/s | $3.44/M |
| #49 | GPT-4.1 nano | OpenAI | 74.1 | 118.2 tok/s | $0.175/M |
| #50 | Qwen3.5 0.8B (Non-reasoning) | Alibaba | 74.1 | 88 tok/s | $0.020/M |
| #51 | NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | NVIDIA | 73.5 | 87.3 tok/s | $0.088/M |
| #52 | Qwen3.5 Omni Flash | Alibaba | 73.5 | 224.4 tok/s | $0.275/M |
| #53 | Gemini 3 Flash Preview (Non-reasoning) | 73.1 | 181.3 tok/s | $1.13/M |
| #54 | GPT-4.1 | OpenAI | 73.1 | 128.3 tok/s | $3.50/M |
| #55 | Claude 4.5 Haiku (Non-reasoning) | Anthropic | 72.3 | 127.2 tok/s | $2.19/M |
| #56 | Ministral 3 14B | Mistral | 71.1 | 86 tok/s | $0.200/M |
| #57 | Llama 3 Instruct 8B | Meta | 69.4 | 88.3 tok/s | $0.070/M |
| #58 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | 69.4 | 133.6 tok/s | $0.086/M |
| #59 | Llama 3.2 Instruct 11B (Vision) | Meta | 69.2 | 87.5 tok/s | $0.245/M |
| #60 | Llama 4 Maverick | Meta | 68.4 | 120.6 tok/s | $0.475/M |
| #61 | GPT-4o (May '24) | OpenAI | 68.0 | 92.6 tok/s | $7.50/M |
| #62 | Llama 4 Scout | Meta | 67.9 | 111.9 tok/s | $0.292/M |
| #63 | GPT-4o (Aug '24) | OpenAI | 67.2 | 94.1 tok/s | $4.38/M |
| #64 | Qwen3 VL 8B Instruct | Alibaba | 66.3 | 147.4 tok/s | $0.310/M |
| #65 | Command A | Cohere | 66.0 | 65.6 tok/s | $4.38/M |
| #66 | Kimi K2 Thinking | Kimi | 65.8 | 131.1 tok/s | $1.08/M |
| #67 | Llama 3.2 Instruct 1B | Meta | 65.5 | 92.9 tok/s | $0.050/M |
| #68 | Llama 3.3 Instruct 70B | Meta | 64.6 | 95 tok/s | $0.612/M |
| #69 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | 64.5 | 149.6 tok/s | $0.412/M |
| #70 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | 64.3 | 133.6 tok/s | $0.096/M |
| #71 | GPT-4.1 mini | OpenAI | 64.1 | 79.3 tok/s | $0.700/M |
| #72 | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 64.1 | 139.3 tok/s | $0.673/M |
| #73 | Mistral Medium 3.1 | Mistral | 63.4 | 70 tok/s | $0.800/M |
| #74 | GPT-5.1 (Non-reasoning) | OpenAI | 62.4 | 93.1 tok/s | $3.44/M |
| #75 | Cogito v2.1 (Reasoning) | Deep Cogito | 62.4 | 62.8 tok/s | $1.25/M |
| #76 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | 61.9 | 84.9 tok/s | $0.198/M |
| #77 | Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | 61.9 | 87.2 tok/s | $0.198/M |
| #78 | DeepSeek V4 Flash (Non-reasoning) | DeepSeek | 61.6 | 120.2 tok/s | $0.175/M |
| #79 | Qwen3 VL 30B A3B (Reasoning) | Alibaba | 61.6 | 126.8 tok/s | $0.338/M |
| #80 | Qwen3 Next 80B A3B (Reasoning) | Alibaba | 61.4 | 135.7 tok/s | $1.88/M |