CVPR2025

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion

Changan Chen, Juze Zhang, Shrinidhi K. Lakshmikanth, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli

摘要

Co-speech gesture generation Text to motion Generative Pre-training Editable gesture generation Sit cross-legged" "A person is striking a golf ball" Emotion understanding What emotion is conveyed by the movements in the body motion? Input Output A person expresses the emotion of "happiness". Figure 1. We introduce a language-model-based motion understanding and generation framework that takes in any of the audio/motion/text modalities and outputs the desired target modality. Coupled with our generative pre-training strategy, our model demonstrates competitive performance on an array of tasks, showing promising signs toward unified verbal and non-verbal language of human motions.