EBEasy BenchmarksLLM model index
Workspace
Overview
Benchmarks
Benchmarks list
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
Models
All models
GPT-5.5 (xhigh)
GPT-5.5 (high)
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Gemini 3.1 Pro Preview
GPT-5.4 (xhigh)
Artificial Analysis data
Back

SciCode

Scientific coding benchmark score.

SciCode converts real scientific research problems into coding tasks. The project reports 80 main problems decomposed into 338 subproblems across scientific domains, requiring knowledge recall, reasoning, and code synthesis.

Test type: Research coding benchmark with scientist-authored solutions and test cases.

Coverage

472 models have this metric.

58.9%

Current leader: Gemini 3.1 Pro Preview

Project links

This app ranks the SciCode score exposed by the Artificial Analysis snapshot.

Official websiteGitHub

Top SciCode Models

Top models ranked by SciCode.

Leaderboard

RankModelCreatorValueSpeedBlended Price
#1Gemini 3.1 Pro PreviewGoogle58.9%131.2 tok/s$4.50/M
#2
GPT-5.4 (xhigh)
OpenAI
56.6%
93.5 tok/s
$5.63/M
#3Gemini 3 Pro Preview (high)Google56.1%128.7 tok/s$4.50/M
#4GPT-5.5 (xhigh)OpenAI56.1%66.1 tok/s$11.25/M
#5GPT-5.5 (high)OpenAI55.9%59.3 tok/s$11.25/M
#6GPT-5.2 Codex (xhigh)OpenAI54.6%87.7 tok/s$4.81/M
#7Claude Opus 4.7 (Adaptive Reasoning, Max Effort)Anthropic54.5%51.8 tok/s$10.00/M
#8GPT-5.5 (medium)OpenAI53.5%57.5 tok/s$11.25/M
#9Kimi K2.6Kimi53.5%29.1 tok/s$1.71/M
#10GPT-5.3 Codex (xhigh)OpenAI53.2%87.1 tok/s$4.81/M
#11GPT-5.2 (xhigh)OpenAI52.1%71.8 tok/s$4.81/M
#12Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic51.9%49.9 tok/s$10.00/M
#13GPT-5.5 (low)OpenAI51.6%56.8 tok/s$11.25/M
#14Muse SparkMeta51.5%n/a-
#15Gemini 3 Flash Preview (Reasoning)Google50.6%193.2 tok/s$1.13/M
#16GPT-5.4 (low)OpenAI50.3%59.1 tok/s$5.63/M
#17MiMo-V2.5-ProXiaomi50.2%59.9 tok/s$1.50/M
#18Claude Opus 4.7 (Non-reasoning, High Effort)Anthropic50.1%43 tok/s$10.00/M
#19DeepSeek V4 Pro (Reasoning, Max Effort)DeepSeek50.0%34.3 tok/s$2.18/M
#20Gemini 3 Flash Preview (Non-reasoning)Google49.9%178.3 tok/s$1.13/M
#21Gemini 3 Pro Preview (low)Google49.9%n/a$4.50/M
#22GPT-5.4 mini (xhigh)OpenAI49.9%158.9 tok/s$1.69/M
#23Claude Opus 4.5 (Reasoning)Anthropic49.5%57 tok/s$10.00/M
#24Kimi K2.5 (Reasoning)Kimi49.0%31.6 tok/s$1.20/M
#25GPT-5.5 (Non-reasoning)OpenAI47.3%51.3 tok/s$11.25/M
#26GPT-5.4 (Non-reasoning)OpenAI47.1%57.2 tok/s$5.63/M
#27Claude Opus 4.5 (Non-reasoning)Anthropic47.0%50.3 tok/s$10.00/M
#28MiniMax-M2.7MiniMax47.0%43.9 tok/s$0.525/M
#29Claude Sonnet 4.6 (Non-reasoning, High Effort)Anthropic46.9%48.3 tok/s$6.00/M
#30GPT-5.4 nano (xhigh)OpenAI46.9%160.3 tok/s$0.463/M
#31Qwen3.6 Max PreviewAlibaba46.9%33.2 tok/s$2.93/M
#32Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic46.8%68 tok/s$6.00/M
#33o4-mini (high)OpenAI46.5%124.5 tok/s$1.93/M
#34DeepSeek V4 Pro (Reasoning, High Effort)DeepSeek46.4%32.9 tok/s$2.18/M
#35GLM-5 (Reasoning)Z AI46.2%64.5 tok/s$1.55/M
#36GPT-5.2 (medium)OpenAI46.2%n/a$4.81/M
#37Claude Opus 4.6 (Non-reasoning, High Effort)Anthropic45.7%42 tok/s$10.00/M
#38Grok 4xAI45.7%50.3 tok/s$6.00/M
#39Grok 4.20 0309 v2 (Reasoning)xAI45.6%89.3 tok/s$3.00/M
#40GLM-4.7 (Reasoning)Z AI45.1%90.3 tok/s$1.00/M
#41DeepSeek V4 Flash (Reasoning, Max Effort)DeepSeek44.9%77.4 tok/s$0.175/M
#42Claude 4.5 Sonnet (Reasoning)Anthropic44.7%43.8 tok/s$6.00/M
#43Grok 4.20 0309 (Reasoning)xAI44.7%87.8 tok/s$3.00/M
#44GPT-5.4 mini (medium)OpenAI44.2%159.2 tok/s$1.69/M
#45Grok 4 Fast (Reasoning)xAI44.2%76.2 tok/s$0.275/M
#46Grok 4.1 Fast (Reasoning)xAI44.2%140.9 tok/s$0.275/M
#47Claude Sonnet 4.6 (Non-reasoning, Low Effort)Anthropic44.1%51.5 tok/s$6.00/M
#48DeepSeek V3.2 SpecialeDeepSeek44.0%n/a-
#49GLM-5.1 (Reasoning)Z AI43.8%45.7 tok/s$2.15/M
#50GLM-5-TurboZ AI43.6%n/a-
#51GLM 5V Turbo (Reasoning)Z AI43.5%n/a-
#52Gemma 4 31B (Reasoning)Google43.4%34.8 tok/s-
#53Claude 4.5 Haiku (Reasoning)Anthropic43.3%103.8 tok/s$2.00/M
#54GPT-5.1 (high)OpenAI43.3%123.3 tok/s$3.44/M
#55MiMo-V2.5Xiaomi43.1%n/a-
#56Qwen3 Max ThinkingAlibaba43.1%34.3 tok/s$2.40/M
#57GPT-5 (high)OpenAI42.9%84.2 tok/s$3.44/M
#58Claude 4.5 Sonnet (Non-reasoning)Anthropic42.8%44.2 tok/s$6.00/M
#59Gemini 2.5 ProGoogle42.8%120.2 tok/s$3.44/M
#60Nova 2.0 Pro Preview (medium)Amazon42.7%112.7 tok/s$3.44/M
#61GPT-5.1 Codex mini (high)OpenAI42.6%207.2 tok/s$0.688/M
#62MiniMax-M2.5MiniMax42.6%79.7 tok/s$0.525/M
#63MiMo-V2-ProXiaomi42.5%n/a-
#64DeepSeek V4 Pro (Non-reasoning)DeepSeek42.4%n/a-
#65Kimi K2 ThinkingKimi42.4%99 tok/s$1.08/M
#66Qwen3 235B A22B 2507 (Reasoning)Alibaba42.4%56 tok/s$2.63/M
#67DeepSeek V4 Flash (Reasoning, High Effort)DeepSeek42.0%n/a$0.175/M
#68Qwen3.5 122B A10B (Reasoning)Alibaba42.0%139.9 tok/s$1.10/M
#69Qwen3.5 397B A17B (Reasoning)Alibaba42.0%50.4 tok/s$1.35/M
#70Gemini 3.1 Flash-Lite PreviewGoogle41.9%332.5 tok/s$0.563/M
#71Gemini 2.5 Pro Preview (May' 25)Google41.6%n/a$3.44/M
#72TEHy3-preview (Reasoning)Tencent41.2%86.4 tok/s-
#73Gemma 4 31B (Non-reasoning)Google41.1%n/a-
#74GPT-5 (medium)OpenAI41.1%82.3 tok/s$3.44/M
#75Qwen3.5 397B A17B (Non-reasoning)Alibaba41.1%52.5 tok/s$1.35/M
#76Cogito v2.1 (Reasoning)Deep Cogito41.0%51.1 tok/s$1.25/M
#77GPT-5 mini (medium)OpenAI41.0%77.2 tok/s$0.688/M
#78o3OpenAI41.0%72.7 tok/s$3.50/M
#79Claude 4 Opus (Non-reasoning)Anthropic40.9%36.6 tok/s$30.00/M
#80Claude 4.1 Opus (Reasoning)Anthropic40.9%35.8 tok/s$30.00/M