Text-to-Video
State-of-the-art video generation across quality, cost, and latency. Grok Imagine is x.AI's most powerful video-audio generative model yet. Bring an image to life, start from a simple text prompt, or even refine a complex cinematic sequence.
1,236
Jan 2026
video
$0.05/s
Catalog
Category rows come directly from Artificial Analysis when the endpoint exposes category-level Elo scores.
| Category | Elo | 95% CI | Appearances |
|---|---|---|---|
| Cartoon and anime | 1,337 | -24/24 | 837 |
| Fantasy | 1,325 | -29/29 | 555 |
| Action | 1,318 | -22/22 | 952 |
| Sci Fi | 1,310 | -20/20 | 1,251 |
| Sports | 1,305 | -24/24 | 772 |
| Long prompt | 1,290 | -22/22 | 1,028 |
| Fashion | 1,290 | -26/26 | 716 |
| Multi-scene | 1,280 | -29/29 | 563 |
| 3D animation |
| 1,280 |
| -39/39 |
| 252 |
| People | 1,279 | -11/11 | 3,809 |
| Specific location or era | 1,275 | -20/20 | 1,118 |
| Text | 1,263 | -26/26 | 691 |