CVPR2023

Generating Holistic 3D Human Motion from Speech

Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

Abstract

Figure 1 . Speech-to-motion translation example. Given a speech signal as input, our approach generates realistic, coherent, and diverse holistic body motions; that is, the body motion together with facial expressions and hand gestures. From top to bottom: the input audio, the corresponding transcript, video frames, and the generated motions. Note that the audio is the only input to our approach, while the transcript and video frames are just shown for reference.