Image-to-Video
Built with a unified multimodal audio-video joint generation architecture, Seedance 2.0 supports four input modalities: text, image, audio, and video. Compared with Version 1.5, Seedance 2.0 delivers a substantial leap in generation quality. It achieves a higher usability rate for complex interaction and motion scenes, with significant improvements in physical accuracy, visual realism, and controllability, making it well-suited for high-quality creation scenarios.
1,345
Mar 2026
video
$7/1M
Catalog
Category rows come directly from Artificial Analysis when the endpoint exposes category-level Elo scores.
| Category | Elo | 95% CI | Appearances |
|---|---|---|---|
| Action | 1,482 | -31/31 | 743 |
| Abstract | 1,437 | -41/41 | 344 |
| Screens | 1,403 | -39/39 | 410 |
| Buildings | 1,390 | -16/16 | 2,337 |
| Transport | 1,380 | -21/21 | 1,379 |
| Moving camera | 1,377 | -16/16 | 2,433 |
| Cartoon and anime | 1,376 | -41/41 | 370 |
| Fantasy | 1,371 | -37/37 | 426 |
| Sports |
| 1,368 |
| -24/24 |
| 954 |
| Physics | 1,361 | -18/18 | 1,801 |
| People | 1,361 | -14/14 | 3,127 |
| Short prompt | 1,352 | -11/11 | 4,654 |