Median seconds until the first output token.
Time to First Token measures perceived responsiveness for streamed responses. Artificial Analysis defines it as the time from request submission to the first received token, with lower values ranking better in this app.
Test type: Hosted endpoint latency measurement. Lower values mean the response starts sooner.
300 models have this metric.
Current leader: Command A+
Project links
Values come from Artificial Analysis performance data in the committed snapshot.
Top models ranked by TTFT.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Command A+ | Cohere | 0.18s | 211.8 tok/s | - |
| #2 |
| Alibaba |
| 0.21s |
| 318.9 tok/s |
| $0.040/M |
| #3 | Qwen3.5 4B (Reasoning) | Alibaba | 0.22s | 195.8 tok/s | $0.060/M |
| #4 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | 0.23s | 118 tok/s | $0.070/M |
| #5 | Llama 3.1 Nemotron Instruct 70B | NVIDIA | 0.25s | 290.5 tok/s | $1.20/M |
| #6 | Qwen3.5 4B (Non-reasoning) | Alibaba | 0.25s | 208.9 tok/s | $0.060/M |
| #7 | Qwen3.5 0.8B (Non-reasoning) | Alibaba | 0.25s | 88 tok/s | $0.020/M |
| #8 | NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | NVIDIA | 0.26s | 87.3 tok/s | $0.088/M |
| #9 | LFM2 24B A2B | Liquid AI | 0.28s | 127.6 tok/s | $0.052/M |
| #10 | Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | 0.30s | 44.2 tok/s | $0.175/M |
| #11 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | 0.30s | 43.9 tok/s | $0.175/M |
| #12 | Tiny Aya Global | Cohere | 0.32s | 123.9 tok/s | - |
| #13 | DeepSeek R1 Distill Llama 70B | DeepSeek | 0.32s | 44.7 tok/s | $0.787/M |
| #14 | Command A | Cohere | 0.33s | 65.6 tok/s | $4.38/M |
| #15 | Gemini 2.5 Flash-Lite (Non-reasoning) | 0.36s | 229.5 tok/s | $0.175/M |
| #16 | Mistral Small 3.2 | Mistral | 0.36s | 127.1 tok/s | $0.128/M |
| #17 | Mistral 7B Instruct | Mistral | 0.38s | 110.4 tok/s | $0.206/M |
| #18 | Granite 4.1 8B | IBM | 0.38s | 134.2 tok/s | $0.063/M |
| #19 | Ministral 3 3B | Mistral | 0.39s | 174.3 tok/s | $0.100/M |
| #20 | Phi-4 Multimodal Instruct | Microsoft | 0.39s | 12.3 tok/s | - |
| #21 | Hermes 3 - Llama-3.1 70B | Nous Research | 0.41s | 34.1 tok/s | $0.300/M |
| #22 | Ministral 3 14B | Mistral | 0.43s | 106.9 tok/s | $0.200/M |
| #23 | Magistral Small 1.2 | Mistral | 0.43s | 110.9 tok/s | $0.750/M |
| #24 | Ministral 3 8B | Mistral | 0.43s | 103.6 tok/s | $0.150/M |
| #25 | Gemma 3n E4B Instruct | 0.44s | 50 tok/s | $0.025/M |
| #26 | Cogito v2.1 (Reasoning) | Deep Cogito | 0.46s | 62.8 tok/s | $1.25/M |
| #27 | Mistral Small (Sep '24) | Mistral | 0.46s | 151.5 tok/s | $0.300/M |
| #28 | Mistral Small 3.1 | Mistral | 0.47s | 158.2 tok/s | $0.138/M |
| #29 | gpt-oss-20B (low) | OpenAI | 0.48s | 224.2 tok/s | $0.095/M |
| #30 | Mistral Small (Feb '24) | Mistral | 0.48s | 157.3 tok/s | $1.50/M |
| #31 | Mistral Small 3 | Mistral | 0.48s | 153.7 tok/s | $0.104/M |
| #32 | QwQ 32B | Alibaba | 0.48s | 31 tok/s | $0.745/M |
| #33 | gpt-oss-20B (high) | OpenAI | 0.48s | 240 tok/s | $0.088/M |
| #34 | GPT-3.5 Turbo | OpenAI | 0.49s | 132.3 tok/s | $0.750/M |
| #35 | Llama 3 Instruct 8B | Meta | 0.49s | 88.3 tok/s | $0.070/M |
| #36 | gpt-oss-120b (low) | OpenAI | 0.50s | 363.9 tok/s | $0.262/M |
| #37 | Mistral Medium 3.1 | Mistral | 0.51s | 70 tok/s | $0.800/M |
| #38 | GPT-4.1 nano | OpenAI | 0.51s | 118.2 tok/s | $0.175/M |
| #39 | Devstral Small (Jul '25) | Mistral | 0.51s | 42.3 tok/s | $0.150/M |
| #40 | Mistral Medium | Mistral | 0.51s | 66.1 tok/s | $4.09/M |
| #41 | Llama 3.1 Instruct 8B | Meta | 0.51s | 201.5 tok/s | $0.100/M |
| #42 | Mistral Medium 3 | Mistral | 0.51s | 42.2 tok/s | $0.800/M |
| #43 | Phi-4 | Microsoft | 0.52s | 36.4 tok/s | $0.219/M |
| #44 | GPT-4.1 mini | OpenAI | 0.52s | 79.3 tok/s | $0.700/M |
| #45 | GPT-4o (May '24) | OpenAI | 0.52s | 92.6 tok/s | $7.50/M |
| #46 | GPT-5.4 nano (Non-Reasoning) | OpenAI | 0.53s | 155.2 tok/s | $0.463/M |
| #47 | Mistral Small 4 (Non-reasoning) | Mistral | 0.53s | 164.1 tok/s | $0.262/M |
| #48 | gpt-oss-120b (high) | OpenAI | 0.53s | 348.5 tok/s | $0.262/M |
| #49 | GPT-5.4 mini (Non-Reasoning) | OpenAI | 0.53s | 165.1 tok/s | $1.69/M |
| #50 | Devstral Medium | Mistral | 0.54s | 59.1 tok/s | $0.800/M |
| #51 | Llama 3.1 Instruct 70B | Meta | 0.54s | 43.3 tok/s | $0.560/M |
| #52 | Grok 4.3 (Non-reasoning) | xAI | 0.55s | 131.8 tok/s | $1.56/M |
| #53 | GPT-5 (ChatGPT) | OpenAI | 0.55s | 167.3 tok/s | $3.44/M |
| #54 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | 0.55s | 223.7 tok/s | $0.300/M |
| #55 | GPT-4o (Aug '24) | OpenAI | 0.55s | 94.1 tok/s | $4.38/M |
| #56 | Mistral Small 4 (Reasoning) | Mistral | 0.55s | 167.3 tok/s | $0.262/M |
| #57 | GPT-4o mini | OpenAI | 0.55s | 69.7 tok/s | $0.262/M |
| #58 | Grok 3 mini Reasoning (high) | xAI | 0.55s | 58.8 tok/s | $0.350/M |
| #59 | Llama 3.2 Instruct 90B (Vision) | Meta | 0.55s | 49 tok/s | $1.38/M |
| #60 | Gemini 2.5 Flash (Non-reasoning) | 0.56s | 185.1 tok/s | $0.850/M |
| #61 | Llama 3.2 Instruct 11B (Vision) | Meta | 0.56s | 87.4 tok/s | $0.245/M |
| #62 | GPT-4.1 | OpenAI | 0.56s | 128.3 tok/s | $3.50/M |
| #63 | Llama 3.2 Instruct 1B | Meta | 0.57s | 92.9 tok/s | $0.050/M |
| #64 | Llama 4 Maverick | Meta | 0.59s | 92.9 tok/s | $0.475/M |
| #65 | Devstral 2 | Mistral | 0.60s | 54.1 tok/s | - |
| #66 | Nemotron 3 Nano Omni 30B A3B Reasoning | NVIDIA | 0.60s | 276.7 tok/s | $0.131/M |
| #67 | Nova Micro | Amazon | 0.60s | 284.2 tok/s | $0.061/M |
| #68 | Llama 4 Scout | Meta | 0.60s | 107.8 tok/s | $0.292/M |
| #69 | Magistral Medium 1.2 | Mistral | 0.61s | 41.1 tok/s | $2.75/M |
| #70 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | 0.61s | 84.9 tok/s | $0.198/M |
| #71 | Llama 3.3 Instruct 70B | Meta | 0.61s | 101.2 tok/s | $0.612/M |
| #72 | GPT-4o (Nov '24) | OpenAI | 0.61s | 142.1 tok/s | $4.38/M |
| #73 | GPT-5.2 (Non-reasoning) | OpenAI | 0.62s | 61 tok/s | $4.81/M |
| #74 | Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | 0.64s | 87.2 tok/s | $0.198/M |
| #75 | Nova Lite | Amazon | 0.64s | 191.7 tok/s | $0.105/M |
| #76 | Llama 3.2 Instruct 3B | Meta | 0.65s | 52.3 tok/s | $0.150/M |
| #77 | GPT-5 nano (minimal) | OpenAI | 0.65s | 153.9 tok/s | $0.138/M |
| #78 | GPT-5.1 (Non-reasoning) | OpenAI | 0.65s | 93.1 tok/s | $3.44/M |
| #79 | Mistral Medium 3.5 | Mistral | 0.66s | 66.5 tok/s | $3.00/M |
| #80 | Mistral Large 3 | Mistral | 0.67s | 64.7 tok/s | $0.750/M |