Difficult broad knowledge and reasoning benchmark score.
Humanity's Last Exam is a broad expert benchmark from CAIS and Scale AI. The public project describes 2,500 difficult questions across many subjects, with closed-ended answers for automatic grading and held-out questions to monitor overfitting.
Test type: Closed-ended expert reasoning and knowledge benchmark with automatic grading.
500 models have this metric.
Current leader: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Project links
This app ranks the HLE score exposed by the Artificial Analysis snapshot.
Top models ranked by HLE.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) | Anthropic | 53.3% | n/a | $20.00/M |
| #2 |
| Anthropic |
| 45.7% |
| 67.8 tok/s |
| $10.00/M |
| #3 | Gemini 3.1 Pro Preview | 44.7% | 124.7 tok/s | $4.50/M |
| #4 | GPT-5.5 (xhigh) | OpenAI | 44.3% | 69 tok/s | $11.25/M |
| #5 | GPT-5.5 (high) | OpenAI | 43.0% | 61.6 tok/s | $11.25/M |
| #6 | GPT-5.4 (xhigh) | OpenAI | 41.6% | 75.5 tok/s | $5.63/M |
| #7 | Gemini 3.5 Flash (high) | 41.0% | 203.3 tok/s | $3.38/M |
| #8 | GPT-5.5 (medium) | OpenAI | 40.6% | 58.7 tok/s | $11.25/M |
| #9 | Gemini 3.5 Flash (medium) | 39.9% | 210.1 tok/s | $3.38/M |
| #10 | GPT-5.3 Codex (xhigh) | OpenAI | 39.9% | 84.5 tok/s | $4.81/M |
| #11 | Muse Spark | Meta | 39.9% | n/a | - |
| #12 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | 39.6% | 53.8 tok/s | $10.00/M |
| #13 | Qwen3.7 Max | Alibaba | 38.1% | 186.5 tok/s | $3.75/M |
| #14 | Gemini 3 Pro Preview (high) | 37.2% | n/a | $4.50/M |
| #15 | MiniMax-M3 | MiniMax | 37.1% | 45.6 tok/s | $0.525/M |
| #16 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 36.7% | 47.3 tok/s | $10.94/M |
| #17 | DeepSeek V4 Pro (Reasoning, Max Effort) | DeepSeek | 35.9% | 61.6 tok/s | $0.544/M |
| #18 | Kimi K2.6 | Kimi | 35.9% | 41.6 tok/s | $1.71/M |
| #19 | GPT-5.2 (xhigh) | OpenAI | 35.4% | 71 tok/s | $4.81/M |
| #20 | Grok 4.3 (high) | xAI | 35.0% | 159.7 tok/s | $1.56/M |
| #21 | Gemini 3 Flash Preview (Reasoning) | 34.7% | 172.8 tok/s | $1.13/M |
| #22 | MiMo-V2.5-Pro | Xiaomi | 33.8% | 43.3 tok/s | $0.544/M |
| #23 | DeepSeek V4 Pro (Reasoning, High Effort) | DeepSeek | 33.5% | 65.7 tok/s | $0.544/M |
| #24 | GPT-5.2 Codex (xhigh) | OpenAI | 33.5% | 105.3 tok/s | $4.81/M |
| #25 | KAT-Coder-Pro V1 | KwaiKAT | 33.4% | 114.7 tok/s | $0.525/M |
| #26 | Qwen3.7 Plus | Alibaba | 33.4% | 53.6 tok/s | $0.590/M |
| #27 | Grok 4.20 0309 v2 (Reasoning) | xAI | 32.2% | 168.7 tok/s | $3.00/M |
| #28 | DeepSeek V4 Flash (Reasoning, Max Effort) | DeepSeek | 32.1% | 98.3 tok/s | $0.175/M |
| #29 | Claude Opus 4.7 (Non-reasoning, High Effort) | Anthropic | 31.2% | 46 tok/s | $10.00/M |
| #30 | GPT-5.5 (low) | OpenAI | 31.0% | 66.4 tok/s | $11.25/M |
| #31 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 30.0% | 63.2 tok/s | $6.00/M |
| #32 | Grok 4.20 0309 (Reasoning) | xAI | 30.0% | 166.5 tok/s | $3.00/M |
| #33 | Kimi K2.5 (Reasoning) | Kimi | 29.4% | 31.7 tok/s | $1.19/M |
| #34 | GPT-5.4 (low) | OpenAI | 28.9% | 63.6 tok/s | $5.63/M |
| #35 | Qwen3.6 Max Preview | Alibaba | 28.9% | 40.9 tok/s | $2.93/M |
| #36 | Claude Opus 4.5 (Reasoning) | Anthropic | 28.4% | 53.5 tok/s | $10.94/M |
| #37 | MiMo-V2-Pro | Xiaomi | 28.3% | 42.5 tok/s | $1.50/M |
| #38 | Grok 4.3 (medium) | xAI | 28.1% | 136.9 tok/s | $1.56/M |
| #39 | MiniMax-M2.7 | MiniMax | 28.1% | 75 tok/s | $0.525/M |
| #40 | GLM-5.1 (Reasoning) | Z AI | 28.0% | 46.8 tok/s | $2.15/M |
| #41 | DeepSeek V4 Flash (Reasoning, High Effort) | DeepSeek | 27.8% | n/a | $0.175/M |
| #42 | Gemini 3 Pro Preview (low) | 27.6% | n/a | $4.50/M |
| #43 | Qwen3.5 397B A17B (Reasoning) | Alibaba | 27.3% | 51.8 tok/s | $1.35/M |
| #44 | GLM-5 (Reasoning) | Z AI | 27.2% | 79.5 tok/s | $1.55/M |
| #45 | GPT-5.4 mini (xhigh) | OpenAI | 26.6% | 178.8 tok/s | $1.69/M |
| #46 | GPT-5 (high) | OpenAI | 26.5% | 111.1 tok/s | $3.44/M |
| #47 | GPT-5.1 (high) | OpenAI | 26.5% | 121.2 tok/s | $3.44/M |
| #48 | GPT-5.4 nano (xhigh) | OpenAI | 26.5% | 147.6 tok/s | $0.463/M |
| #49 | Qwen3 Max Thinking | Alibaba | 26.2% | n/a | $2.40/M |
| #50 | DeepSeek V3.2 Speciale | DeepSeek | 26.1% | n/a | - |
| #51 | Qwen3.6 Plus | Alibaba | 25.7% | 52.8 tok/s | $1.13/M |
| #52 | GLM-5.1 (Non-reasoning) | Z AI | 25.6% | 45.6 tok/s | $2.15/M |
| #53 | GPT-5 Codex (high) | OpenAI | 25.6% | 171.1 tok/s | $3.44/M |
| #54 | Hy3-preview (Reasoning) | Tencent | 25.5% | 96 tok/s | $0.200/M |
| #55 | GLM-5-Turbo | Z AI | 25.4% | n/a | - |
| #56 | MiMo-V2.5 | Xiaomi | 25.2% | 77.4 tok/s | $0.175/M |
| #57 | GLM-4.7 (Reasoning) | Z AI | 25.1% | 79.2 tok/s | $1.00/M |
| #58 | GPT-5.2 (medium) | OpenAI | 24.9% | n/a | $4.81/M |
| #59 | Grok 4.20 0309 v2 (Non-reasoning) | xAI | 24.2% | 160.7 tok/s | $3.00/M |
| #60 | Grok 4 | xAI | 23.9% | n/a | $11.00/M |
| #61 | GPT-5 (medium) | OpenAI | 23.5% | 85.6 tok/s | $3.44/M |
| #62 | GPT-5.1 Codex (high) | OpenAI | 23.4% | 182.1 tok/s | $3.44/M |
| #63 | Qwen3.5 122B A10B (Reasoning) | Alibaba | 23.4% | 143.6 tok/s | $1.10/M |
| #64 | Gemini 3.5 Flash (minimal) | 23.1% | 202.7 tok/s | $3.38/M |
| #65 | Gemma 4 31B (Reasoning) | 22.7% | 34.8 tok/s | - |
| #66 | Step 3.5 Flash 2603 | StepFun | 22.6% | 231 tok/s | $0.150/M |
| #67 | Grok 4.20 0309 (Non-reasoning) | xAI | 22.5% | 158.9 tok/s | $3.00/M |
| #68 | Kimi K2 Thinking | Kimi | 22.3% | 131.1 tok/s | $1.08/M |
| #69 | DeepSeek V3.2 (Reasoning) | DeepSeek | 22.2% | n/a | $0.337/M |
| #70 | MiniMax-M2.1 | MiniMax | 22.2% | 184.6 tok/s | $0.525/M |
| #71 | Qwen3.5 27B (Reasoning) | Alibaba | 22.2% | 82.8 tok/s | $0.825/M |
| #72 | Qwen3.6 27B (Reasoning) | Alibaba | 21.6% | 54.7 tok/s | $1.35/M |
| #73 | Gemini 2.5 Pro | 21.1% | 132 tok/s | $3.44/M |
| #74 | MiMo-V2-Flash (Reasoning) | Xiaomi | 21.1% | 129.5 tok/s | $0.150/M |
| #75 | MiMo-V2-Omni-0327 | Xiaomi | 20.4% | 85.6 tok/s | $0.800/M |
| #76 | GPT-5.5 Instant (May 2026) | OpenAI | 20.3% | n/a | $11.25/M |
| #77 | Qwen3.6 35B A3B (Reasoning) | Alibaba | 20.2% | 159.9 tok/s | $0.557/M |
| #78 | MiMo-V2-Flash (Feb 2026) | Xiaomi | 20.0% | 124.9 tok/s | $0.150/M |
| #79 | o3 | OpenAI | 20.0% | 122.3 tok/s | $3.50/M |
| #80 | MiMo-V2-Omni | Xiaomi | 19.9% | 81.5 tok/s | - |