CVPR2025

Hierarchical Flow Diffusion for Efficient Frame Interpolation

Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, Yinlin Hu

摘要

Input overlay LDMVFI (8.3s) CBBD (2.1s) SGM-VFI (0.19s) Ours (0.20s) Ground truth Figure 1. Different methods for video frame interpolation. Most diffusion-based [15, 44] interpolation methods (LDMVFI [7], CBBD [32]) still have a large gap from non-diffusion-based methods (SGM-VFI [28] ), in both accuracy and efficiency. We propose a diffusion-based model that is 10+ times faster than other diffusion-based methods, and on par with SGM-VFI in efficiency. More importantly, we achieve significantly better accuracy than all baselines. Note how the details and large motions are missed in the baselines, but recovered with our method. We report the inference seconds on the same RTX-4090 GPU with a typical 1024×1024 image pair.