CVPR2025

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang

摘要

Visual & Trajectory A delicate porcelain teacup floats upward on a linen tablecloth. A vibrant apple bobs gently up and down on a hanging tree branch during a calm autumn afternoon. Two roses, one purple, one yellow, sway together before a snow-covered mountain range. A warm rustic lantern floats upward in a dimly lit room. Butterflies flutter over rocks, set against an erupting volcano. A stylish fox wearing sunglasses walks across the whimsical landscape of a candy-filled Mars. Figure 1 . Tora is capable of generating videos guided by arbitrary trajectories, images, texts, or combinations thereof. Our proposed motion modules integrate seamlessly with the scalability of DiT, ensuring that the generated movements not only adhere precisely to the specified trajectory but also effectively emulate physical world dynamics.