NVIDIA
The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
NVIDIA model catalogRank #427 across 526
Rank #380 across 436
Rank #101 across 357
Percentile score by analysis domain.
* Cost is inverted: lower input, output, and blended prices rank higher.
Higher bars mean stronger relative placement.
| Metric | Domain | Value | Rank |
|---|---|---|---|
| Artificial Analysis Intelligence Index | overall | 10.1 | #427 |
| Artificial Analysis Coding Index | coding | 5.9 | #380 |
| Artificial Analysis Math Index | math | 26.7 | #193 |
| MMLU-Pro | reasoning | 64.9% | #252 |
| reasoning |
| 43.9% |
| #390 |
| Humanity's Last Exam | reasoning | 4.5% | #380 |
| LiveCodeBench | coding | 34.5% | #202 |
| SciCode | coding, reasoning | 17.6% | #403 |
| Output Speed | speed | 223.7 tok/s | #23 |
| Time to First Token | speed | 0.55s | #54 |
| Blended Price | cost | $0.300/M | #101 |
| Input Price | cost | $0.200/M | #110 |
| Output Price | cost | $0.600/M | #106 |
| Value Index | cost, overall | 33.7 | #162 |