CVPR2024

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Maomao Li, Yu Li, Tianyu Yang, Yunfei Liu, Dongxu Yue, Zhihui Lin, Dong Xu

Abstract

Figure 1. We propose STEM inversion as an alternative approach to zero-shot video editing, which offers several advantages over the commonly employed DDIM inversion technique. STEM inversion achieves superior temporal consistency in video reconstruction while preserving intricate details. Moreover, it seamlessly integrates with contemporary video editing methods, such as TokenFlow (TF) [10] and FateZero (FZ) [31], enhancing their editing capabilities. Best viewed with zoom-in.