CVPR2025

IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular VideosC

Yuan Li, Ziqian Bai, Feitong Tan, Zhaopeng Cui, Sean Fanello, Yinda Zhang

摘要

2 Google ref. target exp. ref. ref. ref. Inputs Inputs Generated Results Generated Results novel views disparity disparity novel views novel views disparity disparity novel views target exp. novel views disparity disparity novel views target exp. target exp. novel views disparity disparity novel views Figure 1. We propose a 3D-aware video diffusion model for talking head synthesis. Given an image as identity and a sequence of tracking signals (as shown on the left for each example), our model directly generates videos in Multiplane Images (MPIs) in a single denoising process, which is ready for efficient novel-view rendering. This enables immersive viewing experience, e.g. rendering binocular stereo or perspective distortion in VR headset.