Easy Benchmarks
Workspace
Overview
Benchmarks
Benchmarks list
Compare
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
LLMs
Audio
Image
Video
Feedback
Log inSign up
Back

Reasoning & Knowledge

Reasoning leaderboard across GPQA, HLE, MMLU-Pro, and related metrics.

Leader

Domain score averages relative percentile across included metrics.

Gemini 3.1 Pro Preview

Domain score 100

MMLU-ProGPQAHLESciCode

Top Reasoning Models

Reasoning leaderboard across GPQA, HLE, MMLU-Pro, and related metrics.

Domain Leaderboard

RankModelCreatorDomain ScoreSpeedBlended Price
#1Gemini 3.1 Pro PreviewGoogle99.9135.9 tok/s$4.50/M
#2
GPT-5.5 (xhigh)
OpenAI
99.6
62.4 tok/s
$11.25/M
#3GPT-5.5 (high)OpenAI99.462.3 tok/s$11.25/M
#4GPT-5.4 (xhigh)OpenAI99.175.5 tok/s$5.63/M
#5Claude Opus 4.8 (Adaptive Reasoning, Max Effort)Anthropic99.066.3 tok/s$10.94/M
#6GPT-5.5 (medium)OpenAI98.859.2 tok/s$11.25/M
#7Gemini 3 Pro Preview (high)Google98.5n/a$4.50/M
#8Gemini 3.5 Flash (high)Google98.5212.4 tok/s$3.38/M
#9Gemini 3.5 Flash (medium)Google98.3210.2 tok/s$3.38/M
#10Claude Opus 4.7 (Adaptive Reasoning, Max Effort)Anthropic98.261.8 tok/s$10.94/M
#11GPT-5.3 Codex (xhigh)OpenAI98.198.6 tok/s$4.81/M
#12Kimi K2.6Kimi97.541.6 tok/s$1.71/M
#13GPT-5.2 (xhigh)OpenAI97.071 tok/s$4.81/M
#14Qwen3.7 MaxAlibaba97.0186.5 tok/s$3.75/M
#15Gemini 3 Flash Preview (Reasoning)Google96.9172.8 tok/s$1.13/M
#16GPT-5.2 Codex (xhigh)OpenAI96.9105.3 tok/s$4.81/M
#17Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic96.647.3 tok/s$10.94/M
#18Muse SparkMeta96.2n/a-
#19GPT-5.5 (low)OpenAI96.260.2 tok/s$11.25/M
#20DeepSeek V4 Pro (Reasoning, Max Effort)DeepSeek95.861.8 tok/s$0.544/M
#21MiniMax-M3MiniMax95.845.6 tok/s$0.525/M
#22Grok 4.3 (high)xAI95.5237.5 tok/s$1.56/M
#23Gemini 3 Pro Preview (low)Google95.2n/a$4.50/M
#24Claude Opus 4.7 (Non-reasoning, High Effort)Anthropic94.848.1 tok/s$10.94/M
#25DeepSeek V4 Pro (Reasoning, High Effort)DeepSeek94.857.6 tok/s$0.544/M
#26Claude Opus 4.5 (Reasoning)Anthropic94.653.5 tok/s$10.94/M
#27Grok 4.20 0309 v2 (Reasoning)xAI94.5168.7 tok/s$3.00/M
#28MiMo-V2.5-ProXiaomi94.243.3 tok/s$0.544/M
#29Qwen3.7 PlusAlibaba94.053.6 tok/s$0.590/M
#30GPT-5.4 (low)OpenAI94.063.6 tok/s$5.63/M
#31Kimi K2.5 (Reasoning)Kimi93.931.7 tok/s$1.19/M
#32Qwen3.6 Max PreviewAlibaba93.540.9 tok/s$2.93/M
#33DeepSeek V4 Flash (Reasoning, Max Effort)DeepSeek93.4107.8 tok/s$0.175/M
#34Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic93.270.4 tok/s$6.56/M
#35GPT-5.4 mini (xhigh)OpenAI93.0176 tok/s$1.69/M
#36MiniMax-M2.7MiniMax92.875 tok/s$0.525/M
#37Grok 4.20 0309 (Reasoning)xAI92.5166.5 tok/s$3.00/M
#38Grok 4.3 (medium)xAI92.4197.6 tok/s$1.56/M
#39Grok 4xAI92.1n/a$11.00/M
#40GPT-5.1 (high)OpenAI91.8121.2 tok/s$3.44/M
#41DeepSeek V3.2 SpecialeDeepSeek91.6n/a-
#42GPT-5.2 (medium)OpenAI91.0n/a$4.81/M
#43GPT-5 (high)OpenAI90.7111.1 tok/s$3.44/M
#44GLM-5.1 (Reasoning)Z AI90.746.8 tok/s$2.15/M
#45GLM-4.7 (Reasoning)Z AI90.479.2 tok/s$1.00/M
#46Qwen3.5 397B A17B (Reasoning)Alibaba90.351.8 tok/s$1.35/M
#47MiMo-V2-ProXiaomi90.142.5 tok/s$1.50/M
#48GPT-5.5 Instant (May 2026)OpenAI89.4n/a$11.25/M
#49DeepSeek V4 Flash (Reasoning, High Effort)DeepSeek89.3n/a$0.175/M
#50Qwen3 Max ThinkingAlibaba89.3n/a$2.40/M
#51GLM-5-TurboZ AI88.3n/a-
#52GPT-5 (medium)OpenAI88.285.6 tok/s$3.44/M
#53Gemini 2.5 ProGoogle88.2139.8 tok/s$3.44/M
#54Qwen3.6 PlusAlibaba88.252.8 tok/s$1.13/M
#55Gemini 3.5 Flash (minimal)Google88.1199.1 tok/s$3.38/M
#56Claude 4.5 Sonnet (Reasoning)Anthropic88.150.1 tok/s$6.56/M
#57Gemma 4 31B (Reasoning)Google88.134.8 tok/s-
#58MiMo-V2.5Xiaomi88.177.4 tok/s$0.175/M
#59TEHy3-preview (Reasoning)Tencent88.096 tok/s$0.200/M
#60GLM-5 (Reasoning)Z AI87.979.5 tok/s$1.55/M
#61GPT-5.4 nano (xhigh)OpenAI87.9158.3 tok/s$0.463/M
#62GPT-5 Codex (high)OpenAI87.7171.1 tok/s$3.44/M
#63Grok 4.1 Fast (Reasoning)xAI87.7n/a-
#64Gemini 3 Flash Preview (Non-reasoning)Google87.6181.3 tok/s$1.13/M
#65GPT-5.1 Codex (high)OpenAI87.6182.1 tok/s$3.44/M
#66Qwen3.5 122B A10B (Reasoning)Alibaba87.0143.6 tok/s$1.10/M
#67MiniMax-M2.1MiniMax87.0184.6 tok/s$0.525/M
#68Grok 4 Fast (Reasoning)xAI86.8n/a$0.275/M
#69o3-proOpenAI86.622.7 tok/s$35.00/M
#70Kimi K2 ThinkingKimi86.5131.1 tok/s$1.08/M
#71Claude Opus 4.5 (Non-reasoning)Anthropic86.447.6 tok/s$10.94/M
#72Claude Opus 4.6 (Non-reasoning, High Effort)Anthropic86.340.9 tok/s$10.94/M
#73MiniMax-M2.5MiniMax85.5202.9 tok/s$0.525/M
#74Qwen3.5 397B A17B (Non-reasoning)Alibaba85.253.1 tok/s$1.35/M
#75Ring-2.6-1TInclusionAI85.2122.1 tok/s$0.850/M
#76o3OpenAI85.1159.6 tok/s$3.50/M
#77DeepSeek V3.2 (Reasoning)DeepSeek84.9n/a$0.337/M
#78MiMo-V2-Flash (Reasoning)Xiaomi84.1129.5 tok/s$0.150/M
#79Qwen3.5 27B (Reasoning)Alibaba83.782.8 tok/s$0.825/M
#80GPT-5.4 mini (medium)OpenAI83.6184.4 tok/s$1.69/M