CVPR2025

MixerMDM: Learnable Composition of Human Motion Diffusion Models

Pablo Ruiz-Ponce, Germán Barquero, Cristina Palmero, Sergio Escalera, José García Rodríguez

Abstract

Mixed Motion Two persons are in an intense boxing match while suddenly one person throws a kick De no isi ng St ep De no isi ng St ep Mixing Weight Mixing Weight Figure 1. We introduce MixerMDM, the first learnable model composition technique for combining pre-trained text-conditioned human motion diffusion models. MixerMDM has demonstrated a consistent ability to generate highly controllable human interactions by combining a model that generates individual motions from textual descriptions with a model that creates human-human interactions.