CVPR2025

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang

Abstract

A young boy rides a bicycle down the long corridors of a towering ancient castle. The camera follows closely as he moves. As he exits the castle, the scene opens up to reveal a lush, vibrant garden filled with greenery and flowers, sunlight pouring over the landscape. An extreme close-up shot of an ant emerging from its nest. The camera pulls back revealing a neighborhood beyond the hill. Figure 1. Presto can generate long videos with rich content and long-range coherence.