Easy Benchmarks
Workspace
Overview
Benchmarks
Benchmarks list
Compare
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
LLMs
Audio
Image
Video
Feedback
Log inSign up
Back

AIME

Advanced math competition benchmark score.

AIME is the American Invitational Mathematics Examination from the MAA. The official competition is a 15-question, 3-hour exam with integer answers from 0 to 999. In model leaderboards it is used as a concise advanced math reasoning benchmark.

Test type: Short-answer competition math, scored by numerical answer extraction and normalization.

Coverage

194 models have this metric.

95.7%

Current leader: GPT-5 (high)

Project links

This app ranks the AIME score exposed by the Artificial Analysis snapshot.

Official websiteArtificial Analysis methodology

Top AIME Models

Top models ranked by AIME.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1GPT-5 (high)OpenAI95.7%111.1 tok/s$3.44/M
#2
Grok 4
xAI
94.3%
n/a
$11.00/M
#3o4-mini (high)OpenAI94.0%151 tok/s$1.93/M
#4Qwen3 235B A22B 2507 (Reasoning)Alibaba94.0%59.4 tok/s$0.838/M
#5Grok 3 mini Reasoning (high)xAI93.3%58.8 tok/s$0.350/M
#6GPT-5 (medium)OpenAI91.7%85.6 tok/s$3.44/M
#7Qwen3 30B A3B 2507 (Reasoning)Alibaba90.7%139.3 tok/s$0.673/M
#8o3OpenAI90.3%122.3 tok/s$3.50/M
#9DeepSeek R1 0528 (May '25)DeepSeek89.3%n/a$2.06/M
#10Gemini 2.5 ProGoogle88.7%132 tok/s$3.44/M
#11GLM-4.5 (Reasoning)Z AI87.3%50.1 tok/s$1.00/M
#12Gemini 2.5 Pro Preview (Mar' 25)Google87.0%n/a-
#13Llama Nemotron Super 49B v1.5 (Reasoning)NVIDIA86.0%44.2 tok/s$0.175/M
#14o3-mini (high)OpenAI86.0%218.5 tok/s$1.93/M
#15MiniMax M1 80kMiniMax84.7%n/a$0.963/M
#16EXAONE 4.0 32B (Reasoning)LG AI Research84.3%n/a-
#17Gemini 2.5 Flash Preview (Reasoning)Google84.3%n/a-
#18Gemini 2.5 Pro Preview (May' 25)Google84.3%n/a$3.44/M
#19Qwen3 235B A22B (Reasoning)Alibaba84.0%59 tok/s$2.63/M
#20GPT-5 (low)OpenAI83.0%79.3 tok/s$3.44/M
#21Gemini 2.5 Flash (Reasoning)Google82.3%221.3 tok/s$0.850/M
#22MiniMax M1 40kMiniMax81.3%n/a-
#23Qwen3 32B (Reasoning)Alibaba80.7%98.4 tok/s$0.276/M
#24Sonar Reasoning ProPerplexity79.0%n/a-
#25QwQ 32BAlibaba78.0%31 tok/s$0.745/M
#26Claude 4 Sonnet (Reasoning)Anthropic77.3%45.5 tok/s$6.56/M
#27o3-miniOpenAI77.0%203.3 tok/s$1.93/M
#28Sonar ReasoningPerplexity77.0%n/a-
#29Qwen3 14B (Reasoning)Alibaba76.3%63.5 tok/s$0.731/M
#30Claude 4 Opus (Reasoning)Anthropic75.7%36.4 tok/s$32.81/M
#31Qwen3 30B A3B (Reasoning)Alibaba75.3%68.5 tok/s$0.180/M
#32Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)NVIDIA74.7%52.7 tok/s$0.900/M
#33Qwen3 8B (Reasoning)Alibaba74.7%62.5 tok/s$0.370/M
#34Qwen3 30B A3B 2507 InstructAlibaba72.7%105.2 tok/s$0.213/M
#35o1OpenAI72.3%123.2 tok/s$26.25/M
#36Qwen3 235B A22B 2507 InstructAlibaba71.7%42.5 tok/s$0.356/M
#37Magistral Small 1Mistral71.3%n/a-
#38Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)NVIDIA70.7%n/a-
#39Gemini 2.5 Flash-Lite (Reasoning)Google70.3%265.2 tok/s$0.175/M
#40Magistral Medium 1Mistral70.0%n/a-
#41Kimi K2Kimi69.3%24.3 tok/s$1.04/M
#42Solar Pro 2 (Reasoning)Upstage69.0%n/a-
#43DeepSeek R1 Distill Qwen 32BDeepSeek68.7%n/a-
#44DeepSeek R1 (Jan '25)DeepSeek68.3%n/a$2.43/M
#45GLM-4.5-AirZ AI67.3%74.5 tok/s$0.372/M
#46DeepSeek R1 Distill Llama 70BDeepSeek67.0%44.7 tok/s$0.787/M
#47DeepSeek R1 Distill Qwen 14BDeepSeek66.7%n/a-
#48Solar Pro 2 (Preview) (Reasoning)Upstage66.3%n/a-
#49Qwen3 4B (Reasoning)Alibaba65.7%n/a$0.398/M
#50DeepSeek R1 0528 Qwen3 8BDeepSeek65.0%n/a-
#51o1-miniOpenAI60.3%n/a-
#52Llama 3.3 Nemotron Super 49B v1 (Reasoning)NVIDIA58.3%n/a-
#53Claude 4 Opus (Non-reasoning)Anthropic56.3%33.9 tok/s$32.81/M
#54DeepSeek V3 0324DeepSeek52.0%n/a$1.21/M
#55Qwen3 1.7B (Reasoning)Alibaba51.0%n/a$0.398/M
#56Reka Flash 3Reka AI51.0%93.2 tok/s$0.350/M
#57Gemini 2.0 Flash Thinking Experimental (Jan '25)Google50.0%n/a-
#58Gemini 2.5 Flash (Non-reasoning)Google50.0%185.1 tok/s$0.850/M
#59Gemini 2.5 Flash-Lite (Non-reasoning)Google50.0%229.5 tok/s$0.175/M
#60ERNIE 4.5 300B A47BBaidu49.3%23.7 tok/s$0.485/M
#61Claude 3.7 Sonnet (Reasoning)Anthropic48.7%n/a-
#62SonarPerplexity48.7%n/a-
#63Qwen3 Coder 480B A35B InstructAlibaba47.7%61 tok/s$0.675/M
#64EXAONE 4.0 32B (Non-reasoning)LG AI Research47.0%n/a-
#65QwQ 32B-PreviewAlibaba45.3%n/a-
#66Mistral Medium 3Mistral44.0%42.2 tok/s$0.800/M
#67GPT-4.1OpenAI43.7%128.3 tok/s$3.50/M
#68Gemini 2.5 Flash Preview (Non-reasoning)Google43.3%n/a-
#69GPT-4.1 miniOpenAI43.0%79.3 tok/s$0.700/M
#70Claude 4 Sonnet (Non-reasoning)Anthropic40.7%45.2 tok/s$6.56/M
#71Solar Pro 2 (Non-reasoning)Upstage40.7%n/a-
#72Llama 4 MaverickMeta39.0%92.9 tok/s$0.475/M
#73GPT-5 (minimal)OpenAI36.7%67.3 tok/s$3.44/M
#74Gemini 2.0 Pro Experimental (Feb '25)Google36.0%n/a-
#75DeepSeek R1 Distill Llama 8BDeepSeek33.3%n/a-
#76Gemini 2.0 Flash (Feb '25)Google33.0%n/a$0.262/M
#77Grok 3xAI33.0%n/a$8.00/M
#78GPT-4o (March 2025, chatgpt-4o-latest)OpenAI32.7%n/a-
#79Qwen3 235B A22B (Non-reasoning)Alibaba32.7%65.4 tok/s$0.787/M
#80Mistral Small 3.2Mistral32.3%127.1 tok/s$0.128/M