CVPR2025

SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens

Chi Su, Xiaoxuan Ma, Jiajun Su, Yizhou Wang

摘要

Inference time (ms) 50 60 70 80 90 100 110 Mean Vertex Error (mm) Ours (644*) ROMP (512) BEV (512) Multi-HMR (896) Multi-HMR (1288) AiOS (1333) (b) Figure 1. (a) We propose scale-adaptive tokens in our one-stage framework for real-time multi-person 3D mesh estimation. Our method introduces scale-adaptive tokens, dynamically adjusted based on the relative size of individuals in the image, to more efficiently encode features, enabling real-time and accurate multi-person mesh estimation. We present a conceptual visualization of the scale-adaptive tokens. The right column visualizes the predicted meshes projected onto an image from 3DPW [49] dataset and from an elevated view. (b) Comparison of estimation error and inference time across different methods, with input resolutions in parentheses. Our method, using a mixed resolution with a base resolution of 644, achieves comparable performance to state-of-the-art methods on AGORA [33] test set while maintaining real-time inference efficiency. Code and models are available at https://ChiSu001.github.io/SAT-HMR/ .