CVPR2024

Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski

Abstract

Input Picture X coefficients Spectral Volume (Image-Space Modal Basis) 0.2Hz 0.4Hz 3.0Hz Looping video Interactive dynamics … Y coefficients Figure 1. We model a generative image-space prior on scene motion: from a single RGB image, our method generates a spectral volume [23], a motion representation that models dense, long-term pixel trajectories in the Fourier domain. Our learned motion priors can be used to turn a single picture into a seamlessly looping video, or into an interactive simulation of dynamics that responds to user inputs like dragging and releasing points. On the right, we visualize output videos as space-time X-t slices (along the input scanline shown on the left).