EBEasy BenchmarksLLM model index
Workspace
Overview
Benchmarks
Benchmarks list
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
Models
All models
GPT-5.5 (xhigh)
GPT-5.5 (high)
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Gemini 3.1 Pro Preview
GPT-5.4 (xhigh)
Artificial Analysis data
Back

Output Speed

Median output tokens generated per second.

Output Speed is an operational performance metric, not an intelligence benchmark. Artificial Analysis defines it as the average number of output tokens received per second after the first token arrives, using OpenAI tokens as the standard unit.

Test type: Hosted endpoint performance measurement. Higher values mean faster streaming after response start.

Coverage

293 models have this metric.

820.2 tok/s

Current leader: Mercury 2

Project links

Values come from Artificial Analysis performance data in the committed snapshot.

Artificial Analysis methodology

Top Speed Models

Top models ranked by Speed.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1Mercury 2Inception820.2 tok/s820.2 tok/s$0.375/M
#2
Granite 3.3 8B (Non-reasoning)
IBM
410.5 tok/s
410.5 tok/s
$0.085/M
#3Gemini 3.1 Flash-Lite PreviewGoogle332.5 tok/s332.5 tok/s$0.563/M
#4Nova MicroAmazon332.1 tok/s332.1 tok/s$0.061/M
#5Sarvam 30B (high)Sarvam306 tok/s306 tok/s-
#6Ministral 3 3BMistral287.6 tok/s287.6 tok/s$0.100/M
#7Qwen3.5 0.8B (Non-reasoning)Alibaba273.6 tok/s273.6 tok/s$0.020/M
#8gpt-oss-20B (low)OpenAI249.7 tok/s249.7 tok/s$0.108/M
#9Gemini 2.5 Flash-Lite (Reasoning)Google243.6 tok/s243.6 tok/s$0.175/M
#10gpt-oss-20B (high)OpenAI242.3 tok/s242.3 tok/s$0.100/M
#11Gemini 2.5 Flash-Lite (Non-reasoning)Google239.9 tok/s239.9 tok/s$0.175/M
#12Granite 4.0 H SmallIBM238.9 tok/s238.9 tok/s$0.107/M
#13Qwen3.5 2B (Non-reasoning)Alibaba227 tok/s227 tok/s$0.040/M
#14gpt-oss-120B (low)OpenAI216.3 tok/s216.3 tok/s$0.263/M
#15Grok 3 mini Reasoning (high)xAI215.5 tok/s215.5 tok/s$0.350/M
#16Nova 2.0 Omni (Non-reasoning)Amazon215.2 tok/s215.2 tok/s$0.850/M
#17Nova 2.0 Lite (Non-reasoning)Amazon214.2 tok/s214.2 tok/s$0.850/M
#18gpt-oss-120B (high)OpenAI212.3 tok/s212.3 tok/s$0.263/M
#19GPT-5.1 Codex mini (high)OpenAI207.2 tok/s207.2 tok/s$0.688/M
#20Ling 2.6 FlashInclusionAI206 tok/s206 tok/s$0.150/M
#21Qwen3 0.6B (Non-reasoning)Alibaba204.8 tok/s204.8 tok/s$0.188/M
#22Qwen3.5 4B (Reasoning)Alibaba204.8 tok/s204.8 tok/s$0.060/M
#23Qwen3.5 4B (Non-reasoning)Alibaba200.2 tok/s200.2 tok/s$0.060/M
#24Gemini 2.5 Flash (Reasoning)Google199.6 tok/s199.6 tok/s$0.850/M
#25LFM2 24B A2BLiquid AI196.9 tok/s196.9 tok/s$0.052/M
#26Qwen3 0.6B (Reasoning)Alibaba195.1 tok/s195.1 tok/s$0.398/M
#27Devstral Small (Jul '25)Mistral194.2 tok/s194.2 tok/s$0.150/M
#28Gemini 3 Flash Preview (Reasoning)Google193.2 tok/s193.2 tok/s$1.13/M
#29Qwen3.6 35B A3B (Reasoning)Alibaba191.8 tok/s191.8 tok/s$0.557/M
#30Qwen3.5 Omni FlashAlibaba190.4 tok/s190.4 tok/s$0.275/M
#31Gemini 2.5 Flash (Non-reasoning)Google189.1 tok/s189.1 tok/s$0.850/M
#32Nova LiteAmazon186.8 tok/s186.8 tok/s$0.105/M
#33Nova 2.0 Lite (low)Amazon185.6 tok/s185.6 tok/s$0.850/M
#34Qwen3.6 35B A3B (Non-reasoning)Alibaba185.4 tok/s185.4 tok/s$0.844/M
#35Jamba 1.6 MiniAI21 Labs184.5 tok/s184.5 tok/s$0.250/M
#36Gemini 3 Flash Preview (Non-reasoning)Google178.3 tok/s178.3 tok/s$1.13/M
#37Qwen3 Next 80B A3B (Reasoning)Alibaba172.2 tok/s172.2 tok/s$1.88/M
#38Nova 2.0 Lite (high)Amazon170.7 tok/s170.7 tok/s$0.850/M
#39Nova 2.0 Lite (medium)Amazon170.5 tok/s170.5 tok/s$0.850/M
#40GPT-5 Codex (high)OpenAI166.8 tok/s166.8 tok/s$3.44/M
#41Llama 3.1 Instruct 8BMeta164.4 tok/s164.4 tok/s$0.100/M
#42NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)NVIDIA163.1 tok/s163.1 tok/s$0.300/M
#43GPT-5.1 Codex (high)OpenAI162.7 tok/s162.7 tok/s$3.44/M
#44NVIDIA Nemotron 3 Super 120B A12B (Reasoning)NVIDIA162.5 tok/s162.5 tok/s$0.412/M
#45GPT-5.4 nano (xhigh)OpenAI160.3 tok/s160.3 tok/s$0.463/M
#46GPT-5.4 mini (medium)OpenAI159.2 tok/s159.2 tok/s$1.69/M
#47GPT-5.4 mini (xhigh)OpenAI158.9 tok/s158.9 tok/s$1.69/M
#48Qwen3 Coder NextAlibaba157.8 tok/s157.8 tok/s$0.600/M
#49Ministral 3 8BMistral157.6 tok/s157.6 tok/s$0.150/M
#50Mistral 7B InstructMistral156.9 tok/s156.9 tok/s$0.250/M
#51Qwen3 Next 80B A3B InstructAlibaba155.3 tok/s155.3 tok/s$0.875/M
#52NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)NVIDIA154.8 tok/s154.8 tok/s$0.096/M
#53Mistral Small 3.2Mistral153.8 tok/s153.8 tok/s$0.150/M
#54GPT-5.4 nano (medium)OpenAI153.4 tok/s153.4 tok/s$0.463/M
#55NVIDIA Nemotron Nano 9B V2 (Non-reasoning)NVIDIA153.3 tok/s153.3 tok/s$0.086/M
#56GPT-5.4 mini (Non-Reasoning)OpenAI152.7 tok/s152.7 tok/s$1.69/M
#57GPT-5 nano (medium)OpenAI150.3 tok/s150.3 tok/s$0.138/M
#58GPT-5 (ChatGPT)OpenAI149.8 tok/s149.8 tok/s$3.44/M
#59Mistral Small 4 (Reasoning)Mistral149.5 tok/s149.5 tok/s$0.263/M
#60GPT-5.4 nano (Non-Reasoning)OpenAI148.5 tok/s148.5 tok/s$0.463/M
#61Qwen3 VL 8B InstructAlibaba143.4 tok/s143.4 tok/s$0.310/M
#62Qwen3 30B A3B 2507 (Reasoning)Alibaba143.2 tok/s143.2 tok/s$0.750/M
#63Grok 4.1 Fast (Reasoning)xAI140.9 tok/s140.9 tok/s$0.275/M
#64o3-miniOpenAI140.1 tok/s140.1 tok/s$1.93/M
#65o3-mini (high)OpenAI140 tok/s140 tok/s$1.93/M
#66Qwen3.5 122B A10B (Reasoning)Alibaba139.9 tok/s139.9 tok/s$1.10/M
#67Mistral Small 4 (Non-reasoning)Mistral139.5 tok/s139.5 tok/s$0.263/M
#68GPT-5 nano (minimal)OpenAI139.1 tok/s139.1 tok/s$0.138/M
#69Qwen3 1.7B (Non-reasoning)Alibaba139 tok/s139 tok/s$0.188/M
#70Mistral Small 3.1Mistral138.8 tok/s138.8 tok/s$0.150/M
#71Qwen3.5 35B A3B (Reasoning)Alibaba137.7 tok/s137.7 tok/s$0.688/M
#72Qwen3 1.7B (Reasoning)Alibaba136.7 tok/s136.7 tok/s$0.398/M
#73GPT-5 nano (high)OpenAI136 tok/s136 tok/s$0.138/M
#74Mistral Small 3Mistral135.9 tok/s135.9 tok/s$0.150/M
#75Mistral Small (Sep '24)Mistral135 tok/s135 tok/s$0.300/M
#76Granite 4.1 8BIBM134.6 tok/s134.6 tok/s$0.063/M
#77Qwen3 VL 8B (Reasoning)Alibaba132.7 tok/s132.7 tok/s$0.660/M
#78Step 3.5 Flash 2603StepFun132.3 tok/s132.3 tok/s-
#79Qwen3.5 122B A10B (Non-reasoning)Alibaba131.5 tok/s131.5 tok/s$1.10/M
#80Gemini 3.1 Pro PreviewGoogle131.2 tok/s131.2 tok/s$4.50/M