Advanced math competition benchmark score.
AIME is the American Invitational Mathematics Examination from the MAA. The official competition is a 15-question, 3-hour exam with integer answers from 0 to 999. In model leaderboards it is used as a concise advanced math reasoning benchmark.
Test type: Short-answer competition math, scored by numerical answer extraction and normalization.
194 models have this metric.
Current leader: GPT-5 (high)
Project links
This app ranks the AIME score exposed by the Artificial Analysis snapshot.
Top models ranked by AIME.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | GPT-5 (high) | OpenAI | 95.7% | 84.2 tok/s | $3.44/M |
| #2 |
| xAI |
| 94.3% |
| 50.3 tok/s |
| $6.00/M |
| #3 | o4-mini (high) | OpenAI | 94.0% | 124.5 tok/s | $1.93/M |
| #4 | Qwen3 235B A22B 2507 (Reasoning) | Alibaba | 94.0% | 56 tok/s | $2.63/M |
| #5 | Grok 3 mini Reasoning (high) | xAI | 93.3% | 215.5 tok/s | $0.350/M |
| #6 | GPT-5 (medium) | OpenAI | 91.7% | 82.3 tok/s | $3.44/M |
| #7 | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 90.7% | 143.2 tok/s | $0.750/M |
| #8 | o3 | OpenAI | 90.3% | 72.7 tok/s | $3.50/M |
| #9 | DeepSeek R1 0528 (May '25) | DeepSeek | 89.3% | n/a | $2.36/M |
| #10 | Gemini 2.5 Pro | 88.7% | 120.2 tok/s | $3.44/M |
| #11 | GLM-4.5 (Reasoning) | Z AI | 87.3% | 46.4 tok/s | $1.00/M |
| #12 | Gemini 2.5 Pro Preview (Mar' 25) | 87.0% | n/a | - |
| #13 | Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | 86.0% | 50.8 tok/s | $0.175/M |
| #14 | o3-mini (high) | OpenAI | 86.0% | 140 tok/s | $1.93/M |
| #15 | MiniMax M1 80k | MiniMax | 84.7% | n/a | $0.963/M |
| #16 | EXAONE 4.0 32B (Reasoning) | LG AI Research | 84.3% | n/a | - |
| #17 | Gemini 2.5 Flash Preview (Reasoning) | 84.3% | n/a | - |
| #18 | Gemini 2.5 Pro Preview (May' 25) | 84.3% | n/a | $3.44/M |
| #19 | Qwen3 235B A22B (Reasoning) | Alibaba | 84.0% | 61.4 tok/s | $2.63/M |
| #20 | GPT-5 (low) | OpenAI | 83.0% | 65.8 tok/s | $3.44/M |
| #21 | Gemini 2.5 Flash (Reasoning) | 82.3% | 199.6 tok/s | $0.850/M |
| #22 | MiniMax M1 40k | MiniMax | 81.3% | n/a | - |
| #23 | Qwen3 32B (Reasoning) | Alibaba | 80.7% | 91.4 tok/s | $2.63/M |
| #24 | Sonar Reasoning Pro | Perplexity | 79.0% | n/a | - |
| #25 | QwQ 32B | Alibaba | 78.0% | 30.4 tok/s | $0.745/M |
| #26 | Claude 4 Sonnet (Reasoning) | Anthropic | 77.3% | 50.3 tok/s | $6.00/M |
| #27 | o3-mini | OpenAI | 77.0% | 140.1 tok/s | $1.93/M |
| #28 | Sonar Reasoning | Perplexity | 77.0% | n/a | - |
| #29 | Qwen3 14B (Reasoning) | Alibaba | 76.3% | 63 tok/s | $1.31/M |
| #30 | Claude 4 Opus (Reasoning) | Anthropic | 75.7% | 36.8 tok/s | $30.00/M |
| #31 | Qwen3 30B A3B (Reasoning) | Alibaba | 75.3% | 77.8 tok/s | $0.750/M |
| #32 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | NVIDIA | 74.7% | 41 tok/s | $0.900/M |
| #33 | Qwen3 8B (Reasoning) | Alibaba | 74.7% | 87.9 tok/s | $0.660/M |
| #34 | Qwen3 30B A3B 2507 Instruct | Alibaba | 72.7% | 97.9 tok/s | $0.350/M |
| #35 | o1 | OpenAI | 72.3% | 103.3 tok/s | $26.25/M |
| #36 | Qwen3 235B A22B 2507 Instruct | Alibaba | 71.7% | 64.7 tok/s | $1.23/M |
| #37 | Magistral Small 1 | Mistral | 71.3% | n/a | - |
| #38 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | 70.7% | n/a | - |
| #39 | Gemini 2.5 Flash-Lite (Reasoning) | 70.3% | 243.6 tok/s | $0.175/M |
| #40 | Magistral Medium 1 | Mistral | 70.0% | n/a | - |
| #41 | Kimi K2 | Kimi | 69.3% | 33 tok/s | $1.04/M |
| #42 | Solar Pro 2 (Reasoning) | Upstage | 69.0% | n/a | - |
| #43 | DeepSeek R1 Distill Qwen 32B | DeepSeek | 68.7% | n/a | - |
| #44 | DeepSeek R1 (Jan '25) | DeepSeek | 68.3% | n/a | $2.36/M |
| #45 | GLM-4.5-Air | Z AI | 67.3% | 72.9 tok/s | $0.372/M |
| #46 | DeepSeek R1 Distill Llama 70B | DeepSeek | 67.0% | 44 tok/s | $0.875/M |
| #47 | DeepSeek R1 Distill Qwen 14B | DeepSeek | 66.7% | n/a | - |
| #48 | Solar Pro 2 (Preview) (Reasoning) | Upstage | 66.3% | n/a | - |
| #49 | Qwen3 4B (Reasoning) | Alibaba | 65.7% | 101.8 tok/s | $0.398/M |
| #50 | DeepSeek R1 0528 Qwen3 8B | DeepSeek | 65.0% | n/a | - |
| #51 | o1-mini | OpenAI | 60.3% | n/a | - |
| #52 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | NVIDIA | 58.3% | n/a | - |
| #53 | Claude 4 Opus (Non-reasoning) | Anthropic | 56.3% | 36.6 tok/s | $30.00/M |
| #54 | DeepSeek V3 0324 | DeepSeek | 52.0% | n/a | $1.25/M |
| #55 | Qwen3 1.7B (Reasoning) | Alibaba | 51.0% | 136.7 tok/s | $0.398/M |
| #56 | Reka Flash 3 | Reka AI | 51.0% | 90.6 tok/s | $0.350/M |
| #57 | Gemini 2.0 Flash Thinking Experimental (Jan '25) | 50.0% | n/a | - |
| #58 | Gemini 2.5 Flash (Non-reasoning) | 50.0% | 189.1 tok/s | $0.850/M |
| #59 | Gemini 2.5 Flash-Lite (Non-reasoning) | 50.0% | 239.9 tok/s | $0.175/M |
| #60 | ERNIE 4.5 300B A47B | Baidu | 49.3% | 22.7 tok/s | $0.485/M |
| #61 | Claude 3.7 Sonnet (Reasoning) | Anthropic | 48.7% | n/a | $6.00/M |
| #62 | Sonar | Perplexity | 48.7% | n/a | - |
| #63 | Qwen3 Coder 480B A35B Instruct | Alibaba | 47.7% | 66.1 tok/s | $3.00/M |
| #64 | EXAONE 4.0 32B (Non-reasoning) | LG AI Research | 47.0% | n/a | - |
| #65 | QwQ 32B-Preview | Alibaba | 45.3% | n/a | - |
| #66 | Mistral Medium 3 | Mistral | 44.0% | 56.8 tok/s | $0.800/M |
| #67 | GPT-4.1 | OpenAI | 43.7% | 86.4 tok/s | $3.50/M |
| #68 | Gemini 2.5 Flash Preview (Non-reasoning) | 43.3% | n/a | - |
| #69 | GPT-4.1 mini | OpenAI | 43.0% | 78.4 tok/s | $0.700/M |
| #70 | Claude 4 Sonnet (Non-reasoning) | Anthropic | 40.7% | 47.6 tok/s | $6.00/M |
| #71 | Solar Pro 2 (Non-reasoning) | Upstage | 40.7% | n/a | - |
| #72 | Llama 4 Maverick | Meta | 39.0% | 115.2 tok/s | $0.475/M |
| #73 | GPT-5 (minimal) | OpenAI | 36.7% | 64.7 tok/s | $3.44/M |
| #74 | Gemini 2.0 Pro Experimental (Feb '25) | 36.0% | n/a | - |
| #75 | DeepSeek R1 Distill Llama 8B | DeepSeek | 33.3% | n/a | - |
| #76 | Gemini 2.0 Flash (Feb '25) | 33.0% | n/a | $0.263/M |
| #77 | Grok 3 | xAI | 33.0% | 52.2 tok/s | $6.00/M |
| #78 | GPT-4o (March 2025, chatgpt-4o-latest) | OpenAI | 32.7% | n/a | - |
| #79 | Qwen3 235B A22B (Non-reasoning) | Alibaba | 32.7% | 61.1 tok/s | $1.23/M |
| #80 | Mistral Small 3.2 | Mistral | 32.3% | 153.8 tok/s | $0.150/M |