CVPR2025
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman, Haiwen Feng, Michael J. Black, Dimitrios Tzionas, Victoria Fernández Abrevaya
摘要
Input image Video generation by InterDyn using only the hand mask sequence as control signal Force propagation Counterfactual dynamics Future #1 Future #2 Denotes a driving object motion | Tracks indicate an object with generated uncontrolled dynamics t=3 t=13 t=0 t=0 Figure 1 . We present InterDyn, a framework for synthesizing realistic interactive dynamics without 3D reconstruction and physics simulation. Our core principle is to rely on the implicit physics knowledge embedded in large-scale video generative models. Given an image and a "driving motion", our model generates the consequential scene dynamics. We investigate the generated interactive dynamics in a simple object collision scenario (bottom) and complex in-the-wild human-object interaction (top).