EBEasy BenchmarksLLM model index
Workspace
Overview
Benchmarks
Benchmarks list
Overall Index
Coding
Math
MMLU-Pro
Speed
Value
Models
All models
GPT-5.5 (xhigh)
GPT-5.5 (high)
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Gemini 3.1 Pro Preview
GPT-5.4 (xhigh)
Artificial Analysis data
Back

OpenAI

o3

o3 is one of OpenAI's reasoning-focused models, built for harder multi-step tasks where deliberate problem solving matters more than simple chat completion. The benchmark snapshot highlights how that reasoning emphasis translates into scores, latency, and value versus general-purpose models.

Introducing o3 and o4-mini

Operational Metrics

Output Speed72.7 tok/s
First Token8.55s
Blended Price$3.50/M

Model Metadata

Queryable facts extracted from the upstream model payload.

ReleaseApr 16, 2025
Context Windown/a
Modalitiesn/a
API fields: release_date

Strength: MATH-500

Rank #3 across 201 models.

99.2%

Strength: AIME

Rank #8 across 194 models.

90.3%

Strength: LCB

Rank #26 across 343 models.

80.8%

Watch Area: TTFT

Rank #254 across 293 models.

8.55s

Watch Area: Input $

Rank #275 across 325 models.

$2.00/M

Watch Area: Blended $

Rank #268 across 325 models.

$3.50/M

Strength Profile

Percentile score by analysis domain.

* Cost is inverted: lower input, output, and blended prices rank higher.

Benchmark Percentiles

Higher bars mean stronger relative placement.

All Benchmarks

MetricDomainValueRank
Artificial Analysis Intelligence Indexoverall38.4#87
Artificial Analysis Coding Indexcoding38.4#57
Artificial Analysis Math Indexmath88.3#35
MMLU-Proreasoning85.3%#29
GPQA
reasoning
82.7%
#78
Humanity's Last Examreasoning20.0%#68
LiveCodeBenchcoding80.8%#26
SciCodecoding, reasoning41.0%#78
MATH-500math99.2%#3
AIMEmath90.3%#8
Output Speedspeed72.7 tok/s#173
Time to First Tokenspeed8.55s#254
Blended Pricecost$3.50/M#268
Input Pricecost$2.00/M#275
Output Pricecost$8.00/M#250
Value Indexcost, overall11.0#237