CVPR2025

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

Abstract

ScanNet v2 ARKitScenes . CA-1M is the first dataset to provide explicit 3D boxes which cover the full richness of objects while being both spatially accurate and pixel-perfect with respect to each frame. Existing datasets like SUN RGB-D, ScanNet v2, ARKitScenes are either small, coarsely labeled, or lack accurate mappings from world to image space. Since ARKitScenes and CA-1M are labeled on the same underlying data, we can show the effect of exhaustive labeling.