CVPR2023

MoFusion: A Framework for Denoising-Diffusion-Based Motion Synthesis

Rishabh Dabral, Muhammad Hamza Mughal, Vladislav Golyanik, Christian Theobalt

Abstract

A person jumps multiple times A figure crawls on the floor A person kicks with right leg Walking counter-clockwise Figure 1. Our MoFusion approach synthesises long sequences of human motions in 3D from textual and audio inputs (e.g., by providing music samples). Our model has significantly improved generalisability and realism, and can be conditioned on modalities like text and audio. The resulting dance movements match the rhythm of the conditioning music, even if it is outside the training distribution.