CVPR2021

Audio-Driven Emotional Video Portraits

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

摘要

Target ID-1 Sad Emotion Interpolation (a) (b) Happy Sad Figure 1: Audio-Driven Emotional Video Portraits. Given an audio clip and a target video, our Emotional Video Portraits (EVP) approach is capable of generating emotion-controllable talking portraits and change the emotion of them smoothly by interpolating at the latent space. (a) Generated video portraits with the same speech content but different emotions (i.e., contempt and sad). (b) Linear interpolation of the learned latent representation of emotions from sad to happy.