ICML2025

Direct Motion Models for Assessing Generated Videos

Kelsey R. Allen, Carl Doersch, Guangyao Zhou, Mohammed Suhail, Danny Driess, Ignacio Rocco, Yulia Rubanova, Thomas Kipf, Mehdi S. M. Sajjadi, Kevin Patrick Murphy, João Carreira, Sjoerd van Steenkiste

DOI arXiv 出版方

摘要

A current limitation of video generative video models is that they generate plausible looking frames, but poor motion -an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: trajan-paper.github.io. 1. At the distribution level, we show that among several other choices -including VideoMAE v2 (Wang et al., 2023b), I3D (Carreira and Zisserman, 2017), and motion his-Author contributions: KA conceptualized the idea of using track-based latent motion features to measure video quality, moving from distributional metrics to per-video or video-video metrics. KA, SVS co-led the project and ran all the experiments. GZ invented an initial prototype of the TRAJAN architecture, which was further developed by CD. MS supplied the WALT checkpoints. All authors advised on the project direction and contributed to writing the paper.