CVPR2025

Reconstructing People, Places, and Cameras

Lea Müller, Hongsuk Choi, Anthony Zhang, Brent Yi, Jitendra Malik, Angjoo Kanazawa

Abstract

Humans and Structure from Motion (HSfM). We propose a method for the joint reconstruction of humans, scene point clouds, and cameras from an uncalibrated, sparse set of images depicting people. By explicitly incorporating humans into the traditional Structure from Motion (SfM) framework through 2D human keypoint correspondences and leveraging robust initialization from an off-theshelf model for scene and camera reconstruction, our approach demonstrates that integrating these three elements-people, scenes, and cameras-synergistically improves the reconstruction accuracy of each component. Unlike prior work in SfM and human pose estimation, our method reconstructs metric-scale scene point clouds and camera parameters, informed by human mesh predictions, while situating human meshes in coherent world coordinates consistent with the surrounding environment without any explicit contact constraints.