NeurIPS2025

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

Justin Lazarow, Kai Kang, Afshin Dehghan

Abstract

Rooms from Motion realizes an object-centric framework for metric localization and semantic 3D object-level mapping from un-posed RGB images without the need for explicit 2D keypoints or point clouds. Given an unordered collection of images, Rooms from Motion detects every object as a metric 3D box within each image, uses a learned object matcher to associate objects across frames, estimates relative poses using the 3D boxes of matched objects, and finally estimates absolute camera poses and forms global, semantic 3D object tracks (akin to 3D object detection). Above: We visualize the semantics-aware map and camera localization of Rooms from Motion from un-posed RGB images on two challenging ScanNet++ scenes: a large laboratory and a few rooms within a residential space. Below: We show class-agnostic results on an open space from the CA-1M dataset.