CVPR2025

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas

Abstract

5 Archimedes Figure 1 . PromptHMR is a promptable human pose and shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes "side information" readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. PromptHMR recovers human pose and shape from spatial prompts such as (a) face bounding boxes, (b) partial or complete person detection boxes, or (c) segmentation masks. It refines its predictions using semantic prompts such as (c) person-person interaction labels for close contact scenarios, or (d) natural language descriptions of body shape to improve body shape predictions. Both image and video versions of PromptHMR achieve state-of-the-art accuracy.