Median seconds until the first output token.
Time to First Token measures perceived responsiveness for streamed responses. Artificial Analysis defines it as the time from request submission to the first received token, with lower values ranking better in this app.
Test type: Hosted endpoint latency measurement. Lower values mean the response starts sooner.
293 models have this metric.
Current leader: NVIDIA Nemotron Nano 9B V2 (Reasoning)
Project links
Values come from Artificial Analysis performance data in the committed snapshot.
Top models ranked by TTFT.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | 0.24s | 121.6 tok/s | $0.070/M |
| #2 |
| NVIDIA |
| 0.26s |
| 125 tok/s |
| $0.300/M |
| #3 | Ministral 3 3B | Mistral | 0.30s | 287.6 tok/s | $0.100/M |
| #4 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | 0.30s | 51.3 tok/s | $0.175/M |
| #5 | LFM2 24B A2B | Liquid AI | 0.31s | 196.9 tok/s | $0.052/M |
| #6 | Gemma 3n E2B Instruct | 0.31s | 58.7 tok/s | - |
| #7 | gpt-oss-20B (high) | OpenAI | 0.32s | 242.3 tok/s | $0.100/M |
| #8 | Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | 0.33s | 50.8 tok/s | $0.175/M |
| #9 | Ministral 3 8B | Mistral | 0.33s | 157.6 tok/s | $0.150/M |
| #10 | Gemma 3n E4B Instruct | 0.34s | 15.3 tok/s | $0.025/M |
| #11 | Phi-4 Mini Instruct | Microsoft | 0.35s | 44.2 tok/s | - |
| #12 | Ministral 3 14B | Mistral | 0.35s | 121.6 tok/s | $0.200/M |
| #13 | Magistral Small 1.2 | Mistral | 0.35s | 100.3 tok/s | $0.750/M |
| #14 | Mistral 7B Instruct | Mistral | 0.36s | 156.9 tok/s | $0.250/M |
| #15 | Qwen3.5 9B (Reasoning) | Alibaba | 0.36s | 62.9 tok/s | $0.113/M |
| #16 | Phi-4 Multimodal Instruct | Microsoft | 0.37s | 16.7 tok/s | - |
| #17 | Mistral Small 3.2 | Mistral | 0.38s | 153.8 tok/s | $0.150/M |
| #18 | Devstral Small (Jul '25) | Mistral | 0.39s | 194.2 tok/s | $0.150/M |
| #19 | Olmo 3.1 32B Instruct | Allen Institute for AI | 0.39s | 49.3 tok/s | $0.300/M |
| #20 | Grok 3 mini Reasoning (high) | xAI | 0.41s | 215.5 tok/s | $0.350/M |
| #21 | gpt-oss-20B (low) | OpenAI | 0.42s | 249.7 tok/s | $0.108/M |
| #22 | Llama 3.1 Nemotron Instruct 70B | NVIDIA | 0.42s | 36.4 tok/s | $1.20/M |
| #23 | Grok 4.1 Fast (Non-reasoning) | xAI | 0.42s | 112.1 tok/s | $0.275/M |
| #24 | Mistral Medium 3 | Mistral | 0.42s | 56.8 tok/s | $0.800/M |
| #25 | Qwen3.5 0.8B (Non-reasoning) | Alibaba | 0.42s | 273.6 tok/s | $0.020/M |
| #26 | QwQ 32B | Alibaba | 0.43s | 30.4 tok/s | $0.745/M |
| #27 | Hermes 3 - Llama-3.1 70B | Nous Research | 0.44s | 28.8 tok/s | $0.300/M |
| #28 | Command A | Cohere | 0.44s | 50.7 tok/s | $4.38/M |
| #29 | GPT-3.5 Turbo | OpenAI | 0.45s | 92 tok/s | $0.750/M |
| #30 | Grok 4.20 0309 (Non-reasoning) | xAI | 0.45s | 77.1 tok/s | $3.00/M |
| #31 | Grok 4 Fast (Non-reasoning) | xAI | 0.45s | 77.4 tok/s | $0.275/M |
| #32 | Llama 3.1 Instruct 8B | Meta | 0.45s | 164.4 tok/s | $0.100/M |
| #33 | DeepSeek R1 Distill Llama 70B | DeepSeek | 0.46s | 44 tok/s | $0.875/M |
| #34 | GPT-4.1 nano | OpenAI | 0.46s | 125.2 tok/s | $0.175/M |
| #35 | Qwen3.5 4B (Non-reasoning) | Alibaba | 0.46s | 200.2 tok/s | $0.060/M |
| #36 | Gemini 2.5 Flash-Lite (Non-reasoning) | 0.46s | 239.9 tok/s | $0.175/M |
| #37 | Llama 3.2 Instruct 11B (Vision) | Meta | 0.46s | 77.4 tok/s | $0.245/M |
| #38 | Granite 4.1 8B | IBM | 0.47s | 134.6 tok/s | $0.063/M |
| #39 | Qwen3.5 4B (Reasoning) | Alibaba | 0.47s | 204.8 tok/s | $0.060/M |
| #40 | GPT-4o (Nov '24) | OpenAI | 0.47s | 107.3 tok/s | $4.38/M |
| #41 | Magistral Medium 1.2 | Mistral | 0.48s | 42 tok/s | $2.75/M |
| #42 | gpt-oss-120B (high) | OpenAI | 0.49s | 212.3 tok/s | $0.263/M |
| #43 | Grok 3 | xAI | 0.50s | 52.2 tok/s | $6.00/M |
| #44 | Devstral Medium | Mistral | 0.50s | 71.4 tok/s | $0.800/M |
| #45 | Grok 4.20 0309 v2 (Non-reasoning) | xAI | 0.50s | 86.6 tok/s | $3.00/M |
| #46 | Llama 3 Instruct 8B | Meta | 0.50s | 82.2 tok/s | $0.070/M |
| #47 | gpt-oss-120B (low) | OpenAI | 0.50s | 216.3 tok/s | $0.263/M |
| #48 | Phi-4 | Microsoft | 0.51s | 41.6 tok/s | $0.219/M |
| #49 | Qwen3.5 2B (Non-reasoning) | Alibaba | 0.52s | 227 tok/s | $0.040/M |
| #50 | Cogito v2.1 (Reasoning) | Deep Cogito | 0.52s | 51.1 tok/s | $1.25/M |
| #51 | Mistral Small 3.1 | Mistral | 0.52s | 138.8 tok/s | $0.150/M |
| #52 | GPT-4o mini | OpenAI | 0.52s | 59.9 tok/s | $0.263/M |
| #53 | Llama 4 Scout | Meta | 0.52s | 109.2 tok/s | $0.292/M |
| #54 | Gemini 2.5 Flash (Non-reasoning) | 0.53s | 189.1 tok/s | $0.850/M |
| #55 | GPT-4.1 mini | OpenAI | 0.53s | 78.4 tok/s | $0.700/M |
| #56 | Devstral Small 2 | Mistral | 0.53s | 72.5 tok/s | - |
| #57 | Mistral Small 4 (Non-reasoning) | Mistral | 0.53s | 139.5 tok/s | $0.263/M |
| #58 | Llama 3.2 Instruct 90B (Vision) | Meta | 0.54s | 45.8 tok/s | $1.38/M |
| #59 | Pixtral Large | Mistral | 0.54s | 55.5 tok/s | $3.00/M |
| #60 | Claude 4.5 Haiku (Non-reasoning) | Anthropic | 0.57s | 99.9 tok/s | $2.00/M |
| #61 | Trinity Large Thinking | Arcee AI | 0.57s | 124.6 tok/s | $0.395/M |
| #62 | GPT-5.4 (Non-reasoning) | OpenAI | 0.59s | 57.2 tok/s | $5.63/M |
| #63 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | 0.59s | 83.5 tok/s | $0.198/M |
| #64 | Mistral Small (Sep '24) | Mistral | 0.59s | 135 tok/s | $0.300/M |
| #65 | Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | 0.59s | 78.6 tok/s | $0.198/M |
| #66 | Mistral Medium 3.1 | Mistral | 0.59s | 56.3 tok/s | $0.800/M |
| #67 | GPT-5.4 mini (Non-Reasoning) | OpenAI | 0.60s | 152.7 tok/s | $1.69/M |
| #68 | Llama 3.1 Instruct 70B | Meta | 0.60s | 32.2 tok/s | $0.560/M |
| #69 | Llama 3.3 Instruct 70B | Meta | 0.60s | 90.1 tok/s | $0.675/M |
| #70 | GPT-5.4 nano (Non-Reasoning) | OpenAI | 0.61s | 148.5 tok/s | $0.463/M |
| #71 | Mistral Small (Feb '24) | Mistral | 0.61s | 130.6 tok/s | $1.50/M |
| #72 | Mistral Large 3 | Mistral | 0.61s | 53.3 tok/s | $0.750/M |
| #73 | Nova Lite | Amazon | 0.63s | 186.8 tok/s | $0.105/M |
| #74 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | 0.63s | 163.1 tok/s | $0.300/M |
| #75 | GPT-4o (May '24) | OpenAI | 0.64s | 91.6 tok/s | $7.50/M |
| #76 | Mistral Small 4 (Reasoning) | Mistral | 0.64s | 149.5 tok/s | $0.263/M |
| #77 | Nova Micro | Amazon | 0.64s | 332.1 tok/s | $0.061/M |
| #78 | GLM-5.1 (Reasoning) | Z AI | 0.65s | 45.7 tok/s | $2.15/M |
| #79 | Llama 3.2 Instruct 1B | Meta | 0.65s | 97.7 tok/s | $0.100/M |
| #80 | Llama 3.2 Instruct 3B | Meta | 0.66s | 52.2 tok/s | $0.150/M |