Easy Benchmarks
Workspace
Overview
Benchmarks
Benchmarks list
Compare
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
LLMs
Audio
Image
Video
Feedback
Log inSign up
Back

Time to First Token

Median seconds until the first output token.

Time to First Token measures perceived responsiveness for streamed responses. Artificial Analysis defines it as the time from request submission to the first received token, with lower values ranking better in this app.

Test type: Hosted endpoint latency measurement. Lower values mean the response starts sooner.

Coverage

300 models have this metric.

0.18s

Current leader: Command A+

Project links

Values come from Artificial Analysis performance data in the committed snapshot.

Artificial Analysis methodology

Top TTFT Models

Top models ranked by TTFT.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1Command A+Cohere0.18s211.8 tok/s-
#2
Qwen3.5 2B (Non-reasoning)
Alibaba
0.21s
318.9 tok/s
$0.040/M
#3Qwen3.5 4B (Reasoning)Alibaba0.22s195.8 tok/s$0.060/M
#4NVIDIA Nemotron Nano 9B V2 (Reasoning)NVIDIA0.23s118 tok/s$0.070/M
#5Llama 3.1 Nemotron Instruct 70BNVIDIA0.25s290.5 tok/s$1.20/M
#6Qwen3.5 4B (Non-reasoning)Alibaba0.25s208.9 tok/s$0.060/M
#7Qwen3.5 0.8B (Non-reasoning)Alibaba0.25s88 tok/s$0.020/M
#8NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)NVIDIA0.26s87.3 tok/s$0.088/M
#9LFM2 24B A2BLiquid AI0.28s127.6 tok/s$0.052/M
#10Llama Nemotron Super 49B v1.5 (Reasoning)NVIDIA0.30s44.2 tok/s$0.175/M
#11Llama Nemotron Super 49B v1.5 (Non-reasoning)NVIDIA0.30s43.9 tok/s$0.175/M
#12Tiny Aya GlobalCohere0.32s123.9 tok/s-
#13DeepSeek R1 Distill Llama 70BDeepSeek0.32s44.7 tok/s$0.787/M
#14Command ACohere0.33s65.6 tok/s$4.38/M
#15Gemini 2.5 Flash-Lite (Non-reasoning)Google0.36s229.5 tok/s$0.175/M
#16Mistral Small 3.2Mistral0.36s127.1 tok/s$0.128/M
#17Mistral 7B InstructMistral0.38s110.4 tok/s$0.206/M
#18Granite 4.1 8BIBM0.38s134.2 tok/s$0.063/M
#19Ministral 3 3BMistral0.39s174.3 tok/s$0.100/M
#20Phi-4 Multimodal InstructMicrosoft0.39s12.3 tok/s-
#21Hermes 3 - Llama-3.1 70BNous Research0.41s34.1 tok/s$0.300/M
#22Ministral 3 14BMistral0.43s106.9 tok/s$0.200/M
#23Magistral Small 1.2Mistral0.43s110.9 tok/s$0.750/M
#24Ministral 3 8BMistral0.43s103.6 tok/s$0.150/M
#25Gemma 3n E4B InstructGoogle0.44s50 tok/s$0.025/M
#26Cogito v2.1 (Reasoning)Deep Cogito0.46s62.8 tok/s$1.25/M
#27Mistral Small (Sep '24)Mistral0.46s151.5 tok/s$0.300/M
#28Mistral Small 3.1Mistral0.47s158.2 tok/s$0.138/M
#29gpt-oss-20B (low)OpenAI0.48s224.2 tok/s$0.095/M
#30Mistral Small (Feb '24)Mistral0.48s157.3 tok/s$1.50/M
#31Mistral Small 3Mistral0.48s153.7 tok/s$0.104/M
#32QwQ 32BAlibaba0.48s31 tok/s$0.745/M
#33gpt-oss-20B (high)OpenAI0.48s240 tok/s$0.088/M
#34GPT-3.5 TurboOpenAI0.49s132.3 tok/s$0.750/M
#35Llama 3 Instruct 8BMeta0.49s88.3 tok/s$0.070/M
#36gpt-oss-120b (low)OpenAI0.50s363.9 tok/s$0.262/M
#37Mistral Medium 3.1Mistral0.51s70 tok/s$0.800/M
#38GPT-4.1 nanoOpenAI0.51s118.2 tok/s$0.175/M
#39Devstral Small (Jul '25)Mistral0.51s42.3 tok/s$0.150/M
#40Mistral MediumMistral0.51s66.1 tok/s$4.09/M
#41Llama 3.1 Instruct 8BMeta0.51s201.5 tok/s$0.100/M
#42Mistral Medium 3Mistral0.51s42.2 tok/s$0.800/M
#43Phi-4Microsoft0.52s36.4 tok/s$0.219/M
#44GPT-4.1 miniOpenAI0.52s79.3 tok/s$0.700/M
#45GPT-4o (May '24)OpenAI0.52s92.6 tok/s$7.50/M
#46GPT-5.4 nano (Non-Reasoning)OpenAI0.53s155.2 tok/s$0.463/M
#47Mistral Small 4 (Non-reasoning)Mistral0.53s164.1 tok/s$0.262/M
#48gpt-oss-120b (high)OpenAI0.53s348.5 tok/s$0.262/M
#49GPT-5.4 mini (Non-Reasoning)OpenAI0.53s165.1 tok/s$1.69/M
#50Devstral MediumMistral0.54s59.1 tok/s$0.800/M
#51Llama 3.1 Instruct 70BMeta0.54s43.3 tok/s$0.560/M
#52Grok 4.3 (Non-reasoning)xAI0.55s131.8 tok/s$1.56/M
#53GPT-5 (ChatGPT)OpenAI0.55s167.3 tok/s$3.44/M
#54NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)NVIDIA0.55s223.7 tok/s$0.300/M
#55GPT-4o (Aug '24)OpenAI0.55s94.1 tok/s$4.38/M
#56Mistral Small 4 (Reasoning)Mistral0.55s167.3 tok/s$0.262/M
#57GPT-4o miniOpenAI0.55s69.7 tok/s$0.262/M
#58Grok 3 mini Reasoning (high)xAI0.55s58.8 tok/s$0.350/M
#59Llama 3.2 Instruct 90B (Vision)Meta0.55s49 tok/s$1.38/M
#60Gemini 2.5 Flash (Non-reasoning)Google0.56s185.1 tok/s$0.850/M
#61Llama 3.2 Instruct 11B (Vision)Meta0.56s87.4 tok/s$0.245/M
#62GPT-4.1OpenAI0.56s128.3 tok/s$3.50/M
#63Llama 3.2 Instruct 1BMeta0.57s92.9 tok/s$0.050/M
#64Llama 4 MaverickMeta0.59s92.9 tok/s$0.475/M
#65Devstral 2Mistral0.60s54.1 tok/s-
#66Nemotron 3 Nano Omni 30B A3B ReasoningNVIDIA0.60s276.7 tok/s$0.131/M
#67Nova MicroAmazon0.60s284.2 tok/s$0.061/M
#68Llama 4 ScoutMeta0.60s107.8 tok/s$0.292/M
#69Magistral Medium 1.2Mistral0.61s41.1 tok/s$2.75/M
#70Hermes 4 - Llama-3.1 70B (Non-reasoning)Nous Research0.61s84.9 tok/s$0.198/M
#71Llama 3.3 Instruct 70BMeta0.61s101.2 tok/s$0.612/M
#72GPT-4o (Nov '24)OpenAI0.61s142.1 tok/s$4.38/M
#73GPT-5.2 (Non-reasoning)OpenAI0.62s61 tok/s$4.81/M
#74Hermes 4 - Llama-3.1 70B (Reasoning)Nous Research0.64s87.2 tok/s$0.198/M
#75Nova LiteAmazon0.64s191.7 tok/s$0.105/M
#76Llama 3.2 Instruct 3BMeta0.65s52.3 tok/s$0.150/M
#77GPT-5 nano (minimal)OpenAI0.65s153.9 tok/s$0.138/M
#78GPT-5.1 (Non-reasoning)OpenAI0.65s93.1 tok/s$3.44/M
#79Mistral Medium 3.5Mistral0.66s66.5 tok/s$3.00/M
#80Mistral Large 3Mistral0.67s64.7 tok/s$0.750/M