CVPR2025

InterDyn: Controllable Interactive Dynamics with Video Diffusion Models

Rick Akkerman, Haiwen Feng, Michael J. Black, Dimitrios Tzionas, Victoria Fernández Abrevaya

摘要

Input image Video generation by InterDyn using only the hand mask sequence as control signal Force propagation Counterfactual dynamics Future #1 Future #2 Denotes a driving object motion | Tracks indicate an object with generated uncontrolled dynamics t=3 t=13 t=0 t=0 Figure 1 . We present InterDyn, a framework for synthesizing realistic interactive dynamics without 3D reconstruction and physics simulation. Our core principle is to rely on the implicit physics knowledge embedded in large-scale video generative models. Given an image and a "driving motion", our model generates the consequential scene dynamics. We investigate the generated interactive dynamics in a simple object collision scenario (bottom) and complex in-the-wild human-object interaction (top).