Math leaderboard from math-specific benchmark metrics.
Domain score averages relative percentile across included metrics.
Domain score 100
Math leaderboard from math-specific benchmark metrics.
| Rank | Model | Creator | Domain Score | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | GPT-5.2 (xhigh) | OpenAI | 100.0 | 71 tok/s | $4.81/M |
| #2 |
| OpenAI |
| 99.6 |
| 171.1 tok/s |
| $3.44/M |
| #3 | Gemini 3 Flash Preview (Reasoning) | 99.3 | 172.8 tok/s | $1.13/M |
| #4 | DeepSeek V3.2 Speciale | DeepSeek | 98.9 | n/a | - |
| #5 | GPT-5 (high) | OpenAI | 98.6 | 111.1 tok/s | $3.44/M |
| #6 | GPT-5.2 (medium) | OpenAI | 98.5 | n/a | $4.81/M |
| #7 | MiMo-V2-Flash (Reasoning) | Xiaomi | 98.1 | 129.5 tok/s | $0.150/M |
| #8 | Gemini 3 Pro Preview (high) | 97.8 | n/a | $4.50/M |
| #9 | GPT-5.1 Codex (high) | OpenAI | 97.4 | 182.1 tok/s | $3.44/M |
| #10 | Grok 4 | xAI | 97.1 | n/a | $11.00/M |
| #11 | GLM-4.7 (Reasoning) | Z AI | 97.0 | 79.2 tok/s | $1.00/M |
| #12 | KAT-Coder-Pro V1 | KwaiKAT | 96.6 | 114.7 tok/s | $0.525/M |
| #13 | GPT-5 (medium) | OpenAI | 96.5 | 85.6 tok/s | $3.44/M |
| #14 | Kimi K2 Thinking | Kimi | 96.3 | 131.1 tok/s | $1.08/M |
| #15 | o4-mini (high) | OpenAI | 95.8 | 151 tok/s | $1.93/M |
| #16 | Nova 2.0 Lite (high) | Amazon | 95.5 | 177.3 tok/s | $0.850/M |
| #17 | Qwen3 235B A22B 2507 (Reasoning) | Alibaba | 95.2 | 59.4 tok/s | $0.838/M |
| #18 | GPT-5.1 (high) | OpenAI | 95.1 | 121.2 tok/s | $3.44/M |
| #19 | gpt-oss-120b (high) | OpenAI | 94.8 | 358.8 tok/s | $0.262/M |
| #20 | o3-mini (high) | OpenAI | 94.4 | 218.5 tok/s | $1.93/M |
| #21 | o3 | OpenAI | 94.2 | 159.6 tok/s | $3.50/M |
| #22 | DeepSeek V3.2 (Reasoning) | DeepSeek | 94.0 | n/a | $0.337/M |
| #23 | Gemini 2.5 Pro Preview (May' 25) | 93.6 | n/a | $3.44/M |
| #24 | Grok 3 mini Reasoning (high) | xAI | 93.3 | 58.8 tok/s | $0.350/M |
| #25 | GPT-5.1 Codex mini (high) | OpenAI | 93.3 | 213.6 tok/s | $0.688/M |
| #26 | Gemini 2.5 Pro Preview (Mar' 25) | 93.2 | n/a | - |
| #27 | Claude Opus 4.5 (Reasoning) | Anthropic | 92.9 | 53.5 tok/s | $10.94/M |
| #28 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | 92.5 | 133.6 tok/s | $0.096/M |
| #29 | Gemini 2.5 Flash Preview (Reasoning) | 92.1 | n/a | - |
| #30 | GPT-5 mini (high) | OpenAI | 91.8 | 87.4 tok/s | $0.688/M |
| #31 | K-EXAONE (Reasoning) | LG AI Research | 91.0 | n/a | - |
| #32 | DeepSeek V3.1 (Reasoning) | DeepSeek | 90.7 | n/a | $0.865/M |
| #33 | DeepSeek V3.1 Terminus (Reasoning) | DeepSeek | 90.3 | n/a | $1.91/M |
| #34 | Grok 4 Fast (Reasoning) | xAI | 89.9 | n/a | $0.275/M |
| #35 | Nova 2.0 Omni (medium) | Amazon | 89.6 | n/a | $0.850/M |
| #36 | gpt-oss-20B (high) | OpenAI | 89.2 | 272.3 tok/s | $0.088/M |
| #37 | GPT-5 (low) | OpenAI | 88.8 | 79.3 tok/s | $3.44/M |
| #38 | Grok 4.1 Fast (Reasoning) | xAI | 88.8 | n/a | - |
| #39 | Gemini 2.5 Pro | 88.8 | 139.8 tok/s | $3.44/M |
| #40 | Ring-1T | InclusionAI | 88.4 | n/a | - |
| #41 | Nova 2.0 Pro Preview (medium) | Amazon | 88.1 | 138.3 tok/s | $3.44/M |
| #42 | Nova 2.0 Lite (medium) | Amazon | 87.7 | 135.5 tok/s | $0.850/M |
| #43 | o3-mini | OpenAI | 87.5 | 203.3 tok/s | $1.93/M |
| #44 | DeepSeek R1 0528 (May '25) | DeepSeek | 87.0 | n/a | $2.06/M |
| #45 | Qwen3 VL 235B A22B (Reasoning) | Alibaba | 86.9 | 32.5 tok/s | $2.17/M |
| #46 | Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | 86.8 | 44.2 tok/s | $0.175/M |
| #47 | Apriel-v1.6-15B-Thinker | ServiceNow | 86.6 | n/a | - |
| #48 | Claude 4.5 Sonnet (Reasoning) | Anthropic | 86.2 | 50.1 tok/s | $6.56/M |
| #49 | EXAONE 4.0 32B (Reasoning) | LG AI Research | 85.9 | n/a | - |
| #50 | INTELLECT-3 | Prime Intellect | 85.8 | n/a | - |
| #51 | DeepSeek V3.2 Exp (Reasoning) | DeepSeek | 85.4 | n/a | $0.310/M |
| #52 | Claude 4 Sonnet (Reasoning) | Anthropic | 85.1 | 45.5 tok/s | $6.56/M |
| #53 | GLM-4.5 (Reasoning) | Z AI | 84.9 | 50.1 tok/s | $1.00/M |
| #54 | Apriel-v1.5-15B-Thinker | ServiceNow | 84.7 | n/a | - |
| #55 | o1 | OpenAI | 84.7 | 123.2 tok/s | $26.25/M |
| #56 | Sonar Reasoning Pro | Perplexity | 84.5 | n/a | - |
| #57 | Gemini 3 Pro Preview (low) | 84.3 | n/a | $4.50/M |
| #58 | GLM-4.6 (Reasoning) | Z AI | 84.0 | 43.9 tok/s | $0.963/M |
| #59 | Gemini 2.5 Flash (Reasoning) | 83.6 | 221.3 tok/s | $0.850/M |
| #60 | GLM-4.6V (Reasoning) | Z AI | 83.6 | 44.9 tok/s | $0.450/M |
| #61 | ERNIE 5.0 Thinking Preview | Baidu | 83.2 | n/a | - |
| #62 | GPT-5 mini (medium) | OpenAI | 82.8 | 86.7 tok/s | $0.688/M |
| #63 | Claude 4 Opus (Reasoning) | Anthropic | 82.4 | 36.4 tok/s | $32.81/M |
| #64 | Qwen3 VL 32B (Reasoning) | Alibaba | 82.1 | 98.3 tok/s | $2.63/M |
| #65 | Seed-OSS-36B-Instruct | ByteDance Seed | 81.7 | 40.4 tok/s | $0.300/M |
| #66 | Qwen3 Next 80B A3B (Reasoning) | Alibaba | 81.3 | 135.7 tok/s | $1.88/M |
| #67 | Claude 4.5 Haiku (Reasoning) | Anthropic | 81.0 | 152.2 tok/s | $2.19/M |
| #68 | GPT-5 nano (high) | OpenAI | 80.6 | 150.4 tok/s | $0.138/M |
| #69 | R1 1776 | Perplexity | 80.5 | n/a | - |
| #70 | MiniMax M1 80k | MiniMax | 80.4 | n/a | $0.963/M |
| #71 | Ring-flash-2.0 | InclusionAI | 80.2 | n/a | $0.247/M |
| #72 | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 79.8 | 139.3 tok/s | $0.673/M |
| #73 | Qwen3 235B A22B (Reasoning) | Alibaba | 79.8 | 59 tok/s | $2.63/M |
| #74 | Qwen3 32B (Reasoning) | Alibaba | 79.7 | 98.4 tok/s | $0.276/M |
| #75 | GLM-4.5-Air | Z AI | 79.6 | 74.5 tok/s | $0.372/M |
| #76 | Qwen3 235B A22B 2507 Instruct | Alibaba | 79.6 | 42.5 tok/s | $0.356/M |
| #77 | MiniMax-M2.1 | MiniMax | 79.5 | 184.6 tok/s | $0.525/M |
| #78 | Qwen3 4B 2507 (Reasoning) | Alibaba | 79.1 | n/a | - |
| #79 | Qwen3 Max Thinking (Preview) | Alibaba | 78.7 | 50.7 tok/s | $2.40/M |
| #80 | Qwen3 VL 30B A3B (Reasoning) | Alibaba | 78.4 | 126.8 tok/s | $0.338/M |