Scientific coding benchmark score.
SciCode converts real scientific research problems into coding tasks. The project reports 80 main problems decomposed into 338 subproblems across scientific domains, requiring knowledge recall, reasoning, and code synthesis.
Test type: Research coding benchmark with scientist-authored solutions and test cases.
472 models have this metric.
Current leader: Gemini 3.1 Pro Preview
Project links
This app ranks the SciCode score exposed by the Artificial Analysis snapshot.
Top models ranked by SciCode.
| Rank | Model | Creator | Value | Speed | Blended Price |
|---|---|---|---|---|---|
| #1 | Gemini 3.1 Pro Preview | 58.9% | 131.2 tok/s | $4.50/M | |
| #2 |
| OpenAI |
| 56.6% |
| 93.5 tok/s |
| $5.63/M |
| #3 | Gemini 3 Pro Preview (high) | 56.1% | 128.7 tok/s | $4.50/M |
| #4 | GPT-5.5 (xhigh) | OpenAI | 56.1% | 66.1 tok/s | $11.25/M |
| #5 | GPT-5.5 (high) | OpenAI | 55.9% | 59.3 tok/s | $11.25/M |
| #6 | GPT-5.2 Codex (xhigh) | OpenAI | 54.6% | 87.7 tok/s | $4.81/M |
| #7 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | 54.5% | 51.8 tok/s | $10.00/M |
| #8 | GPT-5.5 (medium) | OpenAI | 53.5% | 57.5 tok/s | $11.25/M |
| #9 | Kimi K2.6 | Kimi | 53.5% | 29.1 tok/s | $1.71/M |
| #10 | GPT-5.3 Codex (xhigh) | OpenAI | 53.2% | 87.1 tok/s | $4.81/M |
| #11 | GPT-5.2 (xhigh) | OpenAI | 52.1% | 71.8 tok/s | $4.81/M |
| #12 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 51.9% | 49.9 tok/s | $10.00/M |
| #13 | GPT-5.5 (low) | OpenAI | 51.6% | 56.8 tok/s | $11.25/M |
| #14 | Muse Spark | Meta | 51.5% | n/a | - |
| #15 | Gemini 3 Flash Preview (Reasoning) | 50.6% | 193.2 tok/s | $1.13/M |
| #16 | GPT-5.4 (low) | OpenAI | 50.3% | 59.1 tok/s | $5.63/M |
| #17 | MiMo-V2.5-Pro | Xiaomi | 50.2% | 59.9 tok/s | $1.50/M |
| #18 | Claude Opus 4.7 (Non-reasoning, High Effort) | Anthropic | 50.1% | 43 tok/s | $10.00/M |
| #19 | DeepSeek V4 Pro (Reasoning, Max Effort) | DeepSeek | 50.0% | 34.3 tok/s | $2.18/M |
| #20 | Gemini 3 Flash Preview (Non-reasoning) | 49.9% | 178.3 tok/s | $1.13/M |
| #21 | Gemini 3 Pro Preview (low) | 49.9% | n/a | $4.50/M |
| #22 | GPT-5.4 mini (xhigh) | OpenAI | 49.9% | 158.9 tok/s | $1.69/M |
| #23 | Claude Opus 4.5 (Reasoning) | Anthropic | 49.5% | 57 tok/s | $10.00/M |
| #24 | Kimi K2.5 (Reasoning) | Kimi | 49.0% | 31.6 tok/s | $1.20/M |
| #25 | GPT-5.5 (Non-reasoning) | OpenAI | 47.3% | 51.3 tok/s | $11.25/M |
| #26 | GPT-5.4 (Non-reasoning) | OpenAI | 47.1% | 57.2 tok/s | $5.63/M |
| #27 | Claude Opus 4.5 (Non-reasoning) | Anthropic | 47.0% | 50.3 tok/s | $10.00/M |
| #28 | MiniMax-M2.7 | MiniMax | 47.0% | 43.9 tok/s | $0.525/M |
| #29 | Claude Sonnet 4.6 (Non-reasoning, High Effort) | Anthropic | 46.9% | 48.3 tok/s | $6.00/M |
| #30 | GPT-5.4 nano (xhigh) | OpenAI | 46.9% | 160.3 tok/s | $0.463/M |
| #31 | Qwen3.6 Max Preview | Alibaba | 46.9% | 33.2 tok/s | $2.93/M |
| #32 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 46.8% | 68 tok/s | $6.00/M |
| #33 | o4-mini (high) | OpenAI | 46.5% | 124.5 tok/s | $1.93/M |
| #34 | DeepSeek V4 Pro (Reasoning, High Effort) | DeepSeek | 46.4% | 32.9 tok/s | $2.18/M |
| #35 | GLM-5 (Reasoning) | Z AI | 46.2% | 64.5 tok/s | $1.55/M |
| #36 | GPT-5.2 (medium) | OpenAI | 46.2% | n/a | $4.81/M |
| #37 | Claude Opus 4.6 (Non-reasoning, High Effort) | Anthropic | 45.7% | 42 tok/s | $10.00/M |
| #38 | Grok 4 | xAI | 45.7% | 50.3 tok/s | $6.00/M |
| #39 | Grok 4.20 0309 v2 (Reasoning) | xAI | 45.6% | 89.3 tok/s | $3.00/M |
| #40 | GLM-4.7 (Reasoning) | Z AI | 45.1% | 90.3 tok/s | $1.00/M |
| #41 | DeepSeek V4 Flash (Reasoning, Max Effort) | DeepSeek | 44.9% | 77.4 tok/s | $0.175/M |
| #42 | Claude 4.5 Sonnet (Reasoning) | Anthropic | 44.7% | 43.8 tok/s | $6.00/M |
| #43 | Grok 4.20 0309 (Reasoning) | xAI | 44.7% | 87.8 tok/s | $3.00/M |
| #44 | GPT-5.4 mini (medium) | OpenAI | 44.2% | 159.2 tok/s | $1.69/M |
| #45 | Grok 4 Fast (Reasoning) | xAI | 44.2% | 76.2 tok/s | $0.275/M |
| #46 | Grok 4.1 Fast (Reasoning) | xAI | 44.2% | 140.9 tok/s | $0.275/M |
| #47 | Claude Sonnet 4.6 (Non-reasoning, Low Effort) | Anthropic | 44.1% | 51.5 tok/s | $6.00/M |
| #48 | DeepSeek V3.2 Speciale | DeepSeek | 44.0% | n/a | - |
| #49 | GLM-5.1 (Reasoning) | Z AI | 43.8% | 45.7 tok/s | $2.15/M |
| #50 | GLM-5-Turbo | Z AI | 43.6% | n/a | - |
| #51 | GLM 5V Turbo (Reasoning) | Z AI | 43.5% | n/a | - |
| #52 | Gemma 4 31B (Reasoning) | 43.4% | 34.8 tok/s | - |
| #53 | Claude 4.5 Haiku (Reasoning) | Anthropic | 43.3% | 103.8 tok/s | $2.00/M |
| #54 | GPT-5.1 (high) | OpenAI | 43.3% | 123.3 tok/s | $3.44/M |
| #55 | MiMo-V2.5 | Xiaomi | 43.1% | n/a | - |
| #56 | Qwen3 Max Thinking | Alibaba | 43.1% | 34.3 tok/s | $2.40/M |
| #57 | GPT-5 (high) | OpenAI | 42.9% | 84.2 tok/s | $3.44/M |
| #58 | Claude 4.5 Sonnet (Non-reasoning) | Anthropic | 42.8% | 44.2 tok/s | $6.00/M |
| #59 | Gemini 2.5 Pro | 42.8% | 120.2 tok/s | $3.44/M |
| #60 | Nova 2.0 Pro Preview (medium) | Amazon | 42.7% | 112.7 tok/s | $3.44/M |
| #61 | GPT-5.1 Codex mini (high) | OpenAI | 42.6% | 207.2 tok/s | $0.688/M |
| #62 | MiniMax-M2.5 | MiniMax | 42.6% | 79.7 tok/s | $0.525/M |
| #63 | MiMo-V2-Pro | Xiaomi | 42.5% | n/a | - |
| #64 | DeepSeek V4 Pro (Non-reasoning) | DeepSeek | 42.4% | n/a | - |
| #65 | Kimi K2 Thinking | Kimi | 42.4% | 99 tok/s | $1.08/M |
| #66 | Qwen3 235B A22B 2507 (Reasoning) | Alibaba | 42.4% | 56 tok/s | $2.63/M |
| #67 | DeepSeek V4 Flash (Reasoning, High Effort) | DeepSeek | 42.0% | n/a | $0.175/M |
| #68 | Qwen3.5 122B A10B (Reasoning) | Alibaba | 42.0% | 139.9 tok/s | $1.10/M |
| #69 | Qwen3.5 397B A17B (Reasoning) | Alibaba | 42.0% | 50.4 tok/s | $1.35/M |
| #70 | Gemini 3.1 Flash-Lite Preview | 41.9% | 332.5 tok/s | $0.563/M |
| #71 | Gemini 2.5 Pro Preview (May' 25) | 41.6% | n/a | $3.44/M |
| #72 | Hy3-preview (Reasoning) | Tencent | 41.2% | 86.4 tok/s | - |
| #73 | Gemma 4 31B (Non-reasoning) | 41.1% | n/a | - |
| #74 | GPT-5 (medium) | OpenAI | 41.1% | 82.3 tok/s | $3.44/M |
| #75 | Qwen3.5 397B A17B (Non-reasoning) | Alibaba | 41.1% | 52.5 tok/s | $1.35/M |
| #76 | Cogito v2.1 (Reasoning) | Deep Cogito | 41.0% | 51.1 tok/s | $1.25/M |
| #77 | GPT-5 mini (medium) | OpenAI | 41.0% | 77.2 tok/s | $0.688/M |
| #78 | o3 | OpenAI | 41.0% | 72.7 tok/s | $3.50/M |
| #79 | Claude 4 Opus (Non-reasoning) | Anthropic | 40.9% | 36.6 tok/s | $30.00/M |
| #80 | Claude 4.1 Opus (Reasoning) | Anthropic | 40.9% | 35.8 tok/s | $30.00/M |