EBEasy BenchmarksLLM model index
Workspace
Overview
Benchmarks
Benchmarks list
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
Models
All models
GPT-5.5 (xhigh)
GPT-5.5 (high)
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Gemini 3.1 Pro Preview
GPT-5.4 (xhigh)
Artificial Analysis data
Back

AIME

Advanced math competition benchmark score.

AIME is the American Invitational Mathematics Examination from the MAA. The official competition is a 15-question, 3-hour exam with integer answers from 0 to 999. In model leaderboards it is used as a concise advanced math reasoning benchmark.

Test type: Short-answer competition math, scored by numerical answer extraction and normalization.

Coverage

194 models have this metric.

95.7%

Current leader: GPT-5 (high)

Project links

This app ranks the AIME score exposed by the Artificial Analysis snapshot.

Official websiteArtificial Analysis methodology

Top AIME Models

Top models ranked by AIME.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1GPT-5 (high)OpenAI95.7%84.2 tok/s$3.44/M
#2
Grok 4
xAI
94.3%
50.3 tok/s
$6.00/M
#3o4-mini (high)OpenAI94.0%124.5 tok/s$1.93/M
#4Qwen3 235B A22B 2507 (Reasoning)Alibaba94.0%56 tok/s$2.63/M
#5Grok 3 mini Reasoning (high)xAI93.3%215.5 tok/s$0.350/M
#6GPT-5 (medium)OpenAI91.7%82.3 tok/s$3.44/M
#7Qwen3 30B A3B 2507 (Reasoning)Alibaba90.7%143.2 tok/s$0.750/M
#8o3OpenAI90.3%72.7 tok/s$3.50/M
#9DeepSeek R1 0528 (May '25)DeepSeek89.3%n/a$2.36/M
#10Gemini 2.5 ProGoogle88.7%120.2 tok/s$3.44/M
#11GLM-4.5 (Reasoning)Z AI87.3%46.4 tok/s$1.00/M
#12Gemini 2.5 Pro Preview (Mar' 25)Google87.0%n/a-
#13Llama Nemotron Super 49B v1.5 (Reasoning)NVIDIA86.0%50.8 tok/s$0.175/M
#14o3-mini (high)OpenAI86.0%140 tok/s$1.93/M
#15MiniMax M1 80kMiniMax84.7%n/a$0.963/M
#16EXAONE 4.0 32B (Reasoning)LG AI Research84.3%n/a-
#17Gemini 2.5 Flash Preview (Reasoning)Google84.3%n/a-
#18Gemini 2.5 Pro Preview (May' 25)Google84.3%n/a$3.44/M
#19Qwen3 235B A22B (Reasoning)Alibaba84.0%61.4 tok/s$2.63/M
#20GPT-5 (low)OpenAI83.0%65.8 tok/s$3.44/M
#21Gemini 2.5 Flash (Reasoning)Google82.3%199.6 tok/s$0.850/M
#22MiniMax M1 40kMiniMax81.3%n/a-
#23Qwen3 32B (Reasoning)Alibaba80.7%91.4 tok/s$2.63/M
#24Sonar Reasoning ProPerplexity79.0%n/a-
#25QwQ 32BAlibaba78.0%30.4 tok/s$0.745/M
#26Claude 4 Sonnet (Reasoning)Anthropic77.3%50.3 tok/s$6.00/M
#27o3-miniOpenAI77.0%140.1 tok/s$1.93/M
#28Sonar ReasoningPerplexity77.0%n/a-
#29Qwen3 14B (Reasoning)Alibaba76.3%63 tok/s$1.31/M
#30Claude 4 Opus (Reasoning)Anthropic75.7%36.8 tok/s$30.00/M
#31Qwen3 30B A3B (Reasoning)Alibaba75.3%77.8 tok/s$0.750/M
#32Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)NVIDIA74.7%41 tok/s$0.900/M
#33Qwen3 8B (Reasoning)Alibaba74.7%87.9 tok/s$0.660/M
#34Qwen3 30B A3B 2507 InstructAlibaba72.7%97.9 tok/s$0.350/M
#35o1OpenAI72.3%103.3 tok/s$26.25/M
#36Qwen3 235B A22B 2507 InstructAlibaba71.7%64.7 tok/s$1.23/M
#37Magistral Small 1Mistral71.3%n/a-
#38Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)NVIDIA70.7%n/a-
#39Gemini 2.5 Flash-Lite (Reasoning)Google70.3%243.6 tok/s$0.175/M
#40Magistral Medium 1Mistral70.0%n/a-
#41Kimi K2Kimi69.3%33 tok/s$1.04/M
#42Solar Pro 2 (Reasoning)Upstage69.0%n/a-
#43DeepSeek R1 Distill Qwen 32BDeepSeek68.7%n/a-
#44DeepSeek R1 (Jan '25)DeepSeek68.3%n/a$2.36/M
#45GLM-4.5-AirZ AI67.3%72.9 tok/s$0.372/M
#46DeepSeek R1 Distill Llama 70BDeepSeek67.0%44 tok/s$0.875/M
#47DeepSeek R1 Distill Qwen 14BDeepSeek66.7%n/a-
#48Solar Pro 2 (Preview) (Reasoning)Upstage66.3%n/a-
#49Qwen3 4B (Reasoning)Alibaba65.7%101.8 tok/s$0.398/M
#50DeepSeek R1 0528 Qwen3 8BDeepSeek65.0%n/a-
#51o1-miniOpenAI60.3%n/a-
#52Llama 3.3 Nemotron Super 49B v1 (Reasoning)NVIDIA58.3%n/a-
#53Claude 4 Opus (Non-reasoning)Anthropic56.3%36.6 tok/s$30.00/M
#54DeepSeek V3 0324DeepSeek52.0%n/a$1.25/M
#55Qwen3 1.7B (Reasoning)Alibaba51.0%136.7 tok/s$0.398/M
#56Reka Flash 3Reka AI51.0%90.6 tok/s$0.350/M
#57Gemini 2.0 Flash Thinking Experimental (Jan '25)Google50.0%n/a-
#58Gemini 2.5 Flash (Non-reasoning)Google50.0%189.1 tok/s$0.850/M
#59Gemini 2.5 Flash-Lite (Non-reasoning)Google50.0%239.9 tok/s$0.175/M
#60ERNIE 4.5 300B A47BBaidu49.3%22.7 tok/s$0.485/M
#61Claude 3.7 Sonnet (Reasoning)Anthropic48.7%n/a$6.00/M
#62SonarPerplexity48.7%n/a-
#63Qwen3 Coder 480B A35B InstructAlibaba47.7%66.1 tok/s$3.00/M
#64EXAONE 4.0 32B (Non-reasoning)LG AI Research47.0%n/a-
#65QwQ 32B-PreviewAlibaba45.3%n/a-
#66Mistral Medium 3Mistral44.0%56.8 tok/s$0.800/M
#67GPT-4.1OpenAI43.7%86.4 tok/s$3.50/M
#68Gemini 2.5 Flash Preview (Non-reasoning)Google43.3%n/a-
#69GPT-4.1 miniOpenAI43.0%78.4 tok/s$0.700/M
#70Claude 4 Sonnet (Non-reasoning)Anthropic40.7%47.6 tok/s$6.00/M
#71Solar Pro 2 (Non-reasoning)Upstage40.7%n/a-
#72Llama 4 MaverickMeta39.0%115.2 tok/s$0.475/M
#73GPT-5 (minimal)OpenAI36.7%64.7 tok/s$3.44/M
#74Gemini 2.0 Pro Experimental (Feb '25)Google36.0%n/a-
#75DeepSeek R1 Distill Llama 8BDeepSeek33.3%n/a-
#76Gemini 2.0 Flash (Feb '25)Google33.0%n/a$0.263/M
#77Grok 3xAI33.0%52.2 tok/s$6.00/M
#78GPT-4o (March 2025, chatgpt-4o-latest)OpenAI32.7%n/a-
#79Qwen3 235B A22B (Non-reasoning)Alibaba32.7%61.1 tok/s$1.23/M
#80Mistral Small 3.2Mistral32.3%153.8 tok/s$0.150/M