Scientific coding benchmark score.
SciCode converts real scientific research problems into coding tasks. The project reports 80 main problems decomposed into 338 subproblems across scientific domains, requiring knowledge recall, reasoning, and code synthesis.
Test type: Research coding benchmark with scientist-authored solutions and test cases.
498 models have this metric.
Current leader: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Project links
This app ranks the SciCode score exposed by the Artificial Analysis snapshot.
Top models ranked by SciCode.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) | Anthropic | 60.2% | n/a | $20.00/M |
| #2 |
| 58.9% |
| 124.7 tok/s |
| $4.50/M |
| #3 | GPT-5.4 (xhigh) | OpenAI | 56.6% | 75.5 tok/s | $5.63/M |
| #4 | Gemini 3 Pro Preview (high) | 56.1% | n/a | $4.50/M |
| #5 | GPT-5.5 (xhigh) | OpenAI | 56.1% | 69 tok/s | $11.25/M |
| #6 | GPT-5.5 (high) | OpenAI | 55.9% | 61.6 tok/s | $11.25/M |
| #7 | GPT-5.2 Codex (xhigh) | OpenAI | 54.6% | 105.3 tok/s | $4.81/M |
| #8 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | 54.5% | 53.8 tok/s | $10.00/M |
| #9 | Claude Opus 4.8 (Adaptive Reasoning, Max Effort) | Anthropic | 53.5% | 67.8 tok/s | $10.00/M |
| #10 | GPT-5.5 (medium) | OpenAI | 53.5% | 58.7 tok/s | $11.25/M |
| #11 | Kimi K2.6 | Kimi | 53.5% | 41.6 tok/s | $1.71/M |
| #12 | GPT-5.3 Codex (xhigh) | OpenAI | 53.2% | 84.5 tok/s | $4.81/M |
| #13 | Gemini 3.5 Flash (high) | 53.1% | 203.3 tok/s | $3.38/M |
| #14 | Gemini 3.5 Flash (medium) | 53.0% | 210.1 tok/s | $3.38/M |
| #15 | GPT-5.2 (xhigh) | OpenAI | 52.1% | 71 tok/s | $4.81/M |
| #16 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 51.9% | 47.3 tok/s | $10.94/M |
| #17 | GPT-5.5 (low) | OpenAI | 51.6% | 66.4 tok/s | $11.25/M |
| #18 | Muse Spark | Meta | 51.5% | n/a | - |
| #19 | Gemini 3 Flash Preview (Reasoning) | 50.6% | 172.8 tok/s | $1.13/M |
| #20 | GPT-5.4 (low) | OpenAI | 50.3% | 63.6 tok/s | $5.63/M |
| #21 | GPT-5.5 Instant (May 2026) | OpenAI | 50.3% | n/a | $11.25/M |
| #22 | MiMo-V2.5-Pro | Xiaomi | 50.2% | 43.3 tok/s | $0.544/M |
| #23 | Claude Opus 4.7 (Non-reasoning, High Effort) | Anthropic | 50.1% | 46 tok/s | $10.00/M |
| #24 | DeepSeek V4 Pro (Reasoning, Max Effort) | DeepSeek | 50.0% | 61.6 tok/s | $0.544/M |
| #25 | Gemini 3 Flash Preview (Non-reasoning) | 49.9% | 181.3 tok/s | $1.13/M |
| #26 | Gemini 3 Pro Preview (low) | 49.9% | n/a | $4.50/M |
| #27 | GPT-5.4 mini (xhigh) | OpenAI | 49.9% | 178.8 tok/s | $1.69/M |
| #28 | Claude Opus 4.5 (Reasoning) | Anthropic | 49.5% | 53.5 tok/s | $10.94/M |
| #29 | Kimi K2.5 (Reasoning) | Kimi | 49.0% | 31.7 tok/s | $1.19/M |
| #30 | Gemini 3.5 Flash (minimal) | 48.8% | 202.7 tok/s | $3.38/M |
| #31 | Qwen3.7 Max | Alibaba | 48.8% | 186.5 tok/s | $3.75/M |
| #32 | GPT-5.5 (Non-reasoning) | OpenAI | 47.3% | 54.4 tok/s | $11.25/M |
| #33 | Grok 4.3 (high) | xAI | 47.3% | 159.7 tok/s | $1.56/M |
| #34 | GPT-5.4 (Non-reasoning) | OpenAI | 47.1% | 59.3 tok/s | $5.63/M |
| #35 | Claude Opus 4.5 (Non-reasoning) | Anthropic | 47.0% | 47.6 tok/s | $10.94/M |
| #36 | MiniMax-M2.7 | MiniMax | 47.0% | 75 tok/s | $0.525/M |
| #37 | Claude Sonnet 4.6 (Non-reasoning, High Effort) | Anthropic | 46.9% | 49.1 tok/s | $6.00/M |
| #38 | GPT-5.4 nano (xhigh) | OpenAI | 46.9% | 147.6 tok/s | $0.463/M |
| #39 | Qwen3.6 Max Preview | Alibaba | 46.9% | 40.9 tok/s | $2.93/M |
| #40 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 46.8% | 63.2 tok/s | $6.00/M |
| #41 | o4-mini (high) | OpenAI | 46.5% | 151 tok/s | $1.93/M |
| #42 | DeepSeek V4 Pro (Reasoning, High Effort) | DeepSeek | 46.4% | 65.7 tok/s | $0.544/M |
| #43 | GLM-5 (Reasoning) | Z AI | 46.2% | 79.5 tok/s | $1.55/M |
| #44 | GPT-5.2 (medium) | OpenAI | 46.2% | n/a | $4.81/M |
| #45 | Claude Opus 4.6 (Non-reasoning, High Effort) | Anthropic | 45.7% | 40.9 tok/s | $10.94/M |
| #46 | Grok 4 | xAI | 45.7% | n/a | $11.00/M |
| #47 | Grok 4.20 0309 v2 (Reasoning) | xAI | 45.6% | 168.7 tok/s | $3.00/M |
| #48 | Qwen3.7 Plus | Alibaba | 45.5% | 53.6 tok/s | $0.590/M |
| #49 | MiniMax-M3 | MiniMax | 45.4% | 45.6 tok/s | $0.525/M |
| #50 | GLM-4.7 (Reasoning) | Z AI | 45.1% | 79.2 tok/s | $1.00/M |
| #51 | DeepSeek V4 Flash (Reasoning, Max Effort) | DeepSeek | 44.9% | 98.3 tok/s | $0.175/M |
| #52 | Claude 4.5 Sonnet (Reasoning) | Anthropic | 44.7% | 50.1 tok/s | $6.56/M |
| #53 | Grok 4.20 0309 (Reasoning) | xAI | 44.7% | 166.5 tok/s | $3.00/M |
| #54 | Grok 4.3 (medium) | xAI | 44.6% | 136.9 tok/s | $1.56/M |
| #55 | GPT-5.4 mini (medium) | OpenAI | 44.2% | 177.9 tok/s | $1.69/M |
| #56 | Grok 4 Fast (Reasoning) | xAI | 44.2% | n/a | $0.275/M |
| #57 | Grok 4.1 Fast (Reasoning) | xAI | 44.2% | n/a | - |
| #58 | Claude Sonnet 4.6 (Non-reasoning, Low Effort) | Anthropic | 44.1% | 50.1 tok/s | $6.00/M |
| #59 | DeepSeek V3.2 Speciale | DeepSeek | 44.0% | n/a | - |
| #60 | GLM-5.1 (Reasoning) | Z AI | 43.8% | 46.8 tok/s | $2.15/M |
| #61 | GLM-5-Turbo | Z AI | 43.6% | n/a | - |
| #62 | GLM 5V Turbo (Reasoning) | Z AI | 43.5% | n/a | - |
| #63 | Gemma 4 31B (Reasoning) | 43.4% | 34.8 tok/s | - |
| #64 | Claude 4.5 Haiku (Reasoning) | Anthropic | 43.3% | 148.3 tok/s | $2.00/M |
| #65 | GPT-5.1 (high) | OpenAI | 43.3% | 121.2 tok/s | $3.44/M |
| #66 | MiMo-V2.5 | Xiaomi | 43.1% | 77.4 tok/s | $0.175/M |
| #67 | Qwen3 Max Thinking | Alibaba | 43.1% | n/a | $2.40/M |
| #68 | GPT-5 (high) | OpenAI | 42.9% | 111.1 tok/s | $3.44/M |
| #69 | Claude 4.5 Sonnet (Non-reasoning) | Anthropic | 42.8% | 42.3 tok/s | $6.56/M |
| #70 | Gemini 2.5 Pro | 42.8% | 132 tok/s | $3.44/M |
| #71 | Nova 2.0 Pro Preview (medium) | Amazon | 42.7% | 127.7 tok/s | $3.44/M |
| #72 | GPT-5.1 Codex mini (high) | OpenAI | 42.6% | 213.6 tok/s | $0.688/M |
| #73 | MiniMax-M2.5 | MiniMax | 42.6% | 202.9 tok/s | $0.525/M |
| #74 | MiMo-V2-Pro | Xiaomi | 42.5% | 42.5 tok/s | $1.50/M |
| #75 | DeepSeek V4 Pro (Non-reasoning) | DeepSeek | 42.4% | 67 tok/s | $0.544/M |
| #76 | Kimi K2 Thinking | Kimi | 42.4% | 131.1 tok/s | $1.08/M |
| #77 | Qwen3 235B A22B 2507 (Reasoning) | Alibaba | 42.4% | 59.4 tok/s | $0.838/M |
| #78 | Ring-2.6-1T | InclusionAI | 42.4% | 122.1 tok/s | $0.850/M |
| #79 | DeepSeek V4 Flash (Reasoning, High Effort) | DeepSeek | 42.0% | n/a | $0.175/M |
| #80 | Qwen3.5 122B A10B (Reasoning) | Alibaba | 42.0% | 143.6 tok/s | $1.10/M |