Easy Benchmarks
Workspace
Overview
Benchmarks
Benchmarks list
Compare
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
LLMs
Audio
Image
Video
Feedback
Log inSign up
Back

SciCode

Scientific coding benchmark score.

SciCode converts real scientific research problems into coding tasks. The project reports 80 main problems decomposed into 338 subproblems across scientific domains, requiring knowledge recall, reasoning, and code synthesis.

Test type: Research coding benchmark with scientist-authored solutions and test cases.

Coverage

498 models have this metric.

60.2%

Current leader: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

Project links

This app ranks the SciCode score exposed by the Artificial Analysis snapshot.

Official websiteGitHub

Top SciCode Models

Top models ranked by SciCode.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)Anthropic60.2%n/a$20.00/M
#2
Gemini 3.1 Pro Preview
Google
58.9%
124.7 tok/s
$4.50/M
#3GPT-5.4 (xhigh)OpenAI56.6%75.5 tok/s$5.63/M
#4Gemini 3 Pro Preview (high)Google56.1%n/a$4.50/M
#5GPT-5.5 (xhigh)OpenAI56.1%69 tok/s$11.25/M
#6GPT-5.5 (high)OpenAI55.9%61.6 tok/s$11.25/M
#7GPT-5.2 Codex (xhigh)OpenAI54.6%105.3 tok/s$4.81/M
#8Claude Opus 4.7 (Adaptive Reasoning, Max Effort)Anthropic54.5%53.8 tok/s$10.00/M
#9Claude Opus 4.8 (Adaptive Reasoning, Max Effort)Anthropic53.5%67.8 tok/s$10.00/M
#10GPT-5.5 (medium)OpenAI53.5%58.7 tok/s$11.25/M
#11Kimi K2.6Kimi53.5%41.6 tok/s$1.71/M
#12GPT-5.3 Codex (xhigh)OpenAI53.2%84.5 tok/s$4.81/M
#13Gemini 3.5 Flash (high)Google53.1%203.3 tok/s$3.38/M
#14Gemini 3.5 Flash (medium)Google53.0%210.1 tok/s$3.38/M
#15GPT-5.2 (xhigh)OpenAI52.1%71 tok/s$4.81/M
#16Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic51.9%47.3 tok/s$10.94/M
#17GPT-5.5 (low)OpenAI51.6%66.4 tok/s$11.25/M
#18Muse SparkMeta51.5%n/a-
#19Gemini 3 Flash Preview (Reasoning)Google50.6%172.8 tok/s$1.13/M
#20GPT-5.4 (low)OpenAI50.3%63.6 tok/s$5.63/M
#21GPT-5.5 Instant (May 2026)OpenAI50.3%n/a$11.25/M
#22MiMo-V2.5-ProXiaomi50.2%43.3 tok/s$0.544/M
#23Claude Opus 4.7 (Non-reasoning, High Effort)Anthropic50.1%46 tok/s$10.00/M
#24DeepSeek V4 Pro (Reasoning, Max Effort)DeepSeek50.0%61.6 tok/s$0.544/M
#25Gemini 3 Flash Preview (Non-reasoning)Google49.9%181.3 tok/s$1.13/M
#26Gemini 3 Pro Preview (low)Google49.9%n/a$4.50/M
#27GPT-5.4 mini (xhigh)OpenAI49.9%178.8 tok/s$1.69/M
#28Claude Opus 4.5 (Reasoning)Anthropic49.5%53.5 tok/s$10.94/M
#29Kimi K2.5 (Reasoning)Kimi49.0%31.7 tok/s$1.19/M
#30Gemini 3.5 Flash (minimal)Google48.8%202.7 tok/s$3.38/M
#31Qwen3.7 MaxAlibaba48.8%186.5 tok/s$3.75/M
#32GPT-5.5 (Non-reasoning)OpenAI47.3%54.4 tok/s$11.25/M
#33Grok 4.3 (high)xAI47.3%159.7 tok/s$1.56/M
#34GPT-5.4 (Non-reasoning)OpenAI47.1%59.3 tok/s$5.63/M
#35Claude Opus 4.5 (Non-reasoning)Anthropic47.0%47.6 tok/s$10.94/M
#36MiniMax-M2.7MiniMax47.0%75 tok/s$0.525/M
#37Claude Sonnet 4.6 (Non-reasoning, High Effort)Anthropic46.9%49.1 tok/s$6.00/M
#38GPT-5.4 nano (xhigh)OpenAI46.9%147.6 tok/s$0.463/M
#39Qwen3.6 Max PreviewAlibaba46.9%40.9 tok/s$2.93/M
#40Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic46.8%63.2 tok/s$6.00/M
#41o4-mini (high)OpenAI46.5%151 tok/s$1.93/M
#42DeepSeek V4 Pro (Reasoning, High Effort)DeepSeek46.4%65.7 tok/s$0.544/M
#43GLM-5 (Reasoning)Z AI46.2%79.5 tok/s$1.55/M
#44GPT-5.2 (medium)OpenAI46.2%n/a$4.81/M
#45Claude Opus 4.6 (Non-reasoning, High Effort)Anthropic45.7%40.9 tok/s$10.94/M
#46Grok 4xAI45.7%n/a$11.00/M
#47Grok 4.20 0309 v2 (Reasoning)xAI45.6%168.7 tok/s$3.00/M
#48Qwen3.7 PlusAlibaba45.5%53.6 tok/s$0.590/M
#49MiniMax-M3MiniMax45.4%45.6 tok/s$0.525/M
#50GLM-4.7 (Reasoning)Z AI45.1%79.2 tok/s$1.00/M
#51DeepSeek V4 Flash (Reasoning, Max Effort)DeepSeek44.9%98.3 tok/s$0.175/M
#52Claude 4.5 Sonnet (Reasoning)Anthropic44.7%50.1 tok/s$6.56/M
#53Grok 4.20 0309 (Reasoning)xAI44.7%166.5 tok/s$3.00/M
#54Grok 4.3 (medium)xAI44.6%136.9 tok/s$1.56/M
#55GPT-5.4 mini (medium)OpenAI44.2%177.9 tok/s$1.69/M
#56Grok 4 Fast (Reasoning)xAI44.2%n/a$0.275/M
#57Grok 4.1 Fast (Reasoning)xAI44.2%n/a-
#58Claude Sonnet 4.6 (Non-reasoning, Low Effort)Anthropic44.1%50.1 tok/s$6.00/M
#59DeepSeek V3.2 SpecialeDeepSeek44.0%n/a-
#60GLM-5.1 (Reasoning)Z AI43.8%46.8 tok/s$2.15/M
#61GLM-5-TurboZ AI43.6%n/a-
#62GLM 5V Turbo (Reasoning)Z AI43.5%n/a-
#63Gemma 4 31B (Reasoning)Google43.4%34.8 tok/s-
#64Claude 4.5 Haiku (Reasoning)Anthropic43.3%148.3 tok/s$2.00/M
#65GPT-5.1 (high)OpenAI43.3%121.2 tok/s$3.44/M
#66MiMo-V2.5Xiaomi43.1%77.4 tok/s$0.175/M
#67Qwen3 Max ThinkingAlibaba43.1%n/a$2.40/M
#68GPT-5 (high)OpenAI42.9%111.1 tok/s$3.44/M
#69Claude 4.5 Sonnet (Non-reasoning)Anthropic42.8%42.3 tok/s$6.56/M
#70Gemini 2.5 ProGoogle42.8%132 tok/s$3.44/M
#71Nova 2.0 Pro Preview (medium)Amazon42.7%127.7 tok/s$3.44/M
#72GPT-5.1 Codex mini (high)OpenAI42.6%213.6 tok/s$0.688/M
#73MiniMax-M2.5MiniMax42.6%202.9 tok/s$0.525/M
#74MiMo-V2-ProXiaomi42.5%42.5 tok/s$1.50/M
#75DeepSeek V4 Pro (Non-reasoning)DeepSeek42.4%67 tok/s$0.544/M
#76Kimi K2 ThinkingKimi42.4%131.1 tok/s$1.08/M
#77Qwen3 235B A22B 2507 (Reasoning)Alibaba42.4%59.4 tok/s$0.838/M
#78Ring-2.6-1TInclusionAI42.4%122.1 tok/s$0.850/M
#79DeepSeek V4 Flash (Reasoning, High Effort)DeepSeek42.0%n/a$0.175/M
#80Qwen3.5 122B A10B (Reasoning)Alibaba42.0%143.6 tok/s$1.10/M