CVPR2023

Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement

Siddarth Ravichandran, Ondrej Texler, Dimitar Dinev, Hyun Jae Kang

Abstract

Figure 1 . Three sequences of talking faces generated using the proposed framework. Our method produces high-texture quality, sharp images, and correct lips shapes that are inline with the spoken audio. See the zoom-in patches, our method faithfully reproduces fine textural and content details such as teeth.