Image-to-Video
Build upon an All-in-One product framework, the Kling 3.0 model series supports full multimodal input and output spanning text, images, audio, and video, bringing the understanding, generation, and editing of video together in one streamlined AI workflow. The models integrate multiple tasks, including text-to-video, image-to-video, reference-to-video, and in-video editing, into a single, native multimodal architecture, enabling the models to follow complex narrative logic, deliver precise shot control, and maintain strong prompt adherence.
1,264
Feb 2026
video
$0.168/s
Catalog
Category rows come directly from Artificial Analysis when the endpoint exposes category-level Elo scores.
| Category | Elo | 95% CI | Appearances |
|---|---|---|---|
| Abstract | 1,351 | -39/39 | 372 |
| Fantasy | 1,344 | -36/36 | 466 |
| Action | 1,342 | -27/27 | 751 |
| Screens | 1,308 | -37/37 | 406 |
| 3D animation | 1,306 | -34/34 | 489 |
| Buildings | 1,301 | -15/15 | 2,343 |
| Cartoon and anime | 1,299 | -39/39 | 375 |
| Transport | 1,283 | -19/19 | 1,442 |
| Animals |
| 1,281 |
| -25/25 |
| 818 |
| Moving camera | 1,280 | -15/15 | 2,492 |
| Food | 1,276 | -27/27 | 812 |
| Fashion | 1,273 | -18/18 | 1,747 |