CVPR2025
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue
Abstract
SHIAE, CUHK (b) "Frosty pine: close-up shot medium shot forest vista, cinematic" (a) "Athlete glides across ocean waters snow mountain sand dunes" Figure 1. Our method DiTCtrl takes multiple text prompts as input and demonstrates superior capability in generating longer videos with multiple events, long-range coherence and smooth transitions as output.