Easy Benchmarks
Workspace
Overview
Benchmarks
Benchmarks list
Compare
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
LLMs
Audio
Image
Video
Feedback
Log inSign up
Back

Output Speed

Median output tokens generated per second.

Output Speed is an operational performance metric, not an intelligence benchmark. Artificial Analysis defines it as the average number of output tokens received per second after the first token arrives, using OpenAI tokens as the standard unit.

Test type: Hosted endpoint performance measurement. Higher values mean faster streaming after response start.

Coverage

300 models have this metric.

758.7 tok/s

Current leader: Mercury 2

Project links

Values come from Artificial Analysis performance data in the committed snapshot.

Artificial Analysis methodology

Top Speed Models

Top models ranked by Speed.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1Mercury 2Inception758.7 tok/s758.7 tok/s$0.375/M
#2
LFM2.5-1.2B-Instruct
Liquid AI
540.5 tok/s
540.5 tok/s
-
#3LFM2.5-VL-1.6BLiquid AI507.3 tok/s507.3 tok/s-
#4Granite 4.0 H SmallIBM454.2 tok/s454.2 tok/s$0.107/M
#5Granite 3.3 8B (Non-reasoning)IBM400.9 tok/s400.9 tok/s$0.085/M
#6Step 3.7 FlashStepFun385.5 tok/s385.5 tok/s$0.438/M
#7gpt-oss-120b (low)OpenAI363.9 tok/s363.9 tok/s$0.262/M
#8gpt-oss-120b (high)OpenAI348.5 tok/s348.5 tok/s$0.262/M
#9LFM2 2.6BLiquid AI344.8 tok/s344.8 tok/s-
#10Qwen3.5 2B (Non-reasoning)Alibaba318.9 tok/s318.9 tok/s$0.040/M
#11Llama 3.1 Nemotron Instruct 70BNVIDIA290.5 tok/s290.5 tok/s$1.20/M
#12Nova MicroAmazon284.2 tok/s284.2 tok/s$0.061/M
#13Gemini 3.1 Flash-LiteGoogle281.5 tok/s281.5 tok/s$0.563/M
#14Nemotron 3 Nano Omni 30B A3B ReasoningNVIDIA276.7 tok/s276.7 tok/s$0.131/M
#15Gemini 2.5 Flash-Lite (Reasoning)Google265.2 tok/s265.2 tok/s$0.175/M
#16gpt-oss-20B (high)OpenAI240 tok/s240 tok/s$0.088/M
#17Nova 2.0 Lite (Non-reasoning)Amazon235.3 tok/s235.3 tok/s$0.850/M
#18Step 3.5 Flash 2603StepFun231 tok/s231 tok/s$0.150/M
#19Gemini 2.5 Flash-Lite (Non-reasoning)Google229.5 tok/s229.5 tok/s$0.175/M
#20LFM2.5-8B-A1BLiquid AI229.3 tok/s229.3 tok/s-
#21Qwen3.5 Omni FlashAlibaba224.4 tok/s224.4 tok/s$0.275/M
#22gpt-oss-20B (low)OpenAI224.2 tok/s224.2 tok/s$0.095/M
#23NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)NVIDIA223.7 tok/s223.7 tok/s$0.300/M
#24Gemini 2.5 Flash (Reasoning)Google221.3 tok/s221.3 tok/s$0.850/M
#25o3-mini (high)OpenAI218.5 tok/s218.5 tok/s$1.93/M
#26Step 3.5 FlashStepFun217.5 tok/s217.5 tok/s$0.150/M
#27GPT-5.1 Codex mini (high)OpenAI213.6 tok/s213.6 tok/s$0.688/M
#28Command A+Cohere211.8 tok/s211.8 tok/s-
#29Gemini 3.5 Flash (medium)Google210.1 tok/s210.1 tok/s$3.38/M
#30Qwen3.5 4B (Non-reasoning)Alibaba208.9 tok/s208.9 tok/s$0.060/M
#31o3-miniOpenAI203.3 tok/s203.3 tok/s$1.93/M
#32Gemini 3.5 Flash (high)Google203.3 tok/s203.3 tok/s$3.38/M
#33MiniMax-M2.5MiniMax202.9 tok/s202.9 tok/s$0.525/M
#34Gemini 3.5 Flash (minimal)Google202.7 tok/s202.7 tok/s$3.38/M
#35Llama 3.1 Instruct 8BMeta201.5 tok/s201.5 tok/s$0.100/M
#36Qwen3.5 4B (Reasoning)Alibaba195.8 tok/s195.8 tok/s$0.060/M
#37Nova LiteAmazon191.7 tok/s191.7 tok/s$0.105/M
#38Nova 2.0 Lite (medium)Amazon190.6 tok/s190.6 tok/s$0.850/M
#39Qwen3.7 MaxAlibaba186.5 tok/s186.5 tok/s$3.75/M
#40Jamba 1.6 MiniAI21 Labs185.9 tok/s185.9 tok/s$0.250/M
#41Gemini 2.5 Flash (Non-reasoning)Google185.1 tok/s185.1 tok/s$0.850/M
#42MiniMax-M2.1MiniMax184.6 tok/s184.6 tok/s$0.525/M
#43GPT-5.1 Codex (high)OpenAI182.1 tok/s182.1 tok/s$3.44/M
#44Gemini 3 Flash Preview (Non-reasoning)Google181.3 tok/s181.3 tok/s$1.13/M
#45GPT-5.4 mini (xhigh)OpenAI178.8 tok/s178.8 tok/s$1.69/M
#46GPT-5.4 mini (medium)OpenAI177.9 tok/s177.9 tok/s$1.69/M
#47Nova 2.0 Lite (high)Amazon174.7 tok/s174.7 tok/s$0.850/M
#48Ministral 3 3BMistral174.3 tok/s174.3 tok/s$0.100/M
#49Gemini 3 Flash Preview (Reasoning)Google172.8 tok/s172.8 tok/s$1.13/M
#50GPT-5 Codex (high)OpenAI171.1 tok/s171.1 tok/s$3.44/M
#51Grok 4.20 0309 v2 (Reasoning)xAI168.7 tok/s168.7 tok/s$3.00/M
#52Nova 2.0 Lite (low)Amazon167.6 tok/s167.6 tok/s$0.850/M
#53GPT-5 (ChatGPT)OpenAI167.3 tok/s167.3 tok/s$3.44/M
#54Mistral Small 4 (Reasoning)Mistral167.3 tok/s167.3 tok/s$0.262/M
#55GPT-5 nano (medium)OpenAI167 tok/s167 tok/s$0.138/M
#56Grok 4.20 0309 (Reasoning)xAI166.5 tok/s166.5 tok/s$3.00/M
#57GPT-5.4 mini (Non-Reasoning)OpenAI165.1 tok/s165.1 tok/s$1.69/M
#58Mistral Small 4 (Non-reasoning)Mistral164.1 tok/s164.1 tok/s$0.262/M
#59Trinity Large ThinkingArcee AI162.3 tok/s162.3 tok/s$0.395/M
#60Grok 4.20 0309 v2 (Non-reasoning)xAI160.7 tok/s160.7 tok/s$3.00/M
#61Qwen3.6 35B A3B (Reasoning)Alibaba159.9 tok/s159.9 tok/s$0.557/M
#62Grok 4.3 (high)xAI159.7 tok/s159.7 tok/s$1.56/M
#63Qwen3.6 35B A3B (Non-reasoning)Alibaba159.4 tok/s159.4 tok/s$0.844/M
#64Grok 4.20 0309 (Non-reasoning)xAI158.9 tok/s158.9 tok/s$3.00/M
#65Gemma 4 12B (Reasoning)Google158.6 tok/s158.6 tok/s$0.150/M
#66Mistral Small 3.1Mistral158.2 tok/s158.2 tok/s$0.138/M
#67Mistral Small (Feb '24)Mistral157.3 tok/s157.3 tok/s$1.50/M
#68GPT-5.4 nano (Non-Reasoning)OpenAI155.2 tok/s155.2 tok/s$0.463/M
#69GPT-5 nano (minimal)OpenAI153.9 tok/s153.9 tok/s$0.138/M
#70Mistral Small 3Mistral153.7 tok/s153.7 tok/s$0.104/M
#71Qwen3.5 122B A10B (Non-reasoning)Alibaba152.5 tok/s152.5 tok/s$1.10/M
#72Mistral Small (Sep '24)Mistral151.5 tok/s151.5 tok/s$0.300/M
#73o4-mini (high)OpenAI151 tok/s151 tok/s$1.93/M
#74GPT-5 nano (high)OpenAI150.4 tok/s150.4 tok/s$0.138/M
#75GPT-5.4 nano (medium)OpenAI149.7 tok/s149.7 tok/s$0.463/M
#76NVIDIA Nemotron 3 Super 120B A12B (Reasoning)NVIDIA149.6 tok/s149.6 tok/s$0.412/M
#77Grok 4.3 (low)xAI148.4 tok/s148.4 tok/s$1.56/M
#78Claude 4.5 Haiku (Reasoning)Anthropic148.3 tok/s148.3 tok/s$2.00/M
#79Sarvam 30B (high)Sarvam147.9 tok/s147.9 tok/s$0.047/M
#80Nova 2.0 Pro Preview (low)Amazon147.9 tok/s147.9 tok/s$3.44/M