ICCV2023

Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction

Xiang Zhang, Zeyuan Chen, Fangyin Wei, Zhuowen Tu

24 citations

Abstract

Performing holistic 3D scene understanding from a single-view observation, involving generating instance shapes and 3D scene segmentation, is a long-standing challenge. Prevailing works either focus only on geometry or segmentation, or model the task in two folds by separate modules, whose results are merged later to form the final prediction. Inspired by recent advances in 2D vision that unify image segmentation and detection by Transformerbased models, we present Uni-3D, a holistic 3D scene parsing/reconstruction system for a single RGB image. Uni-3D features a universal model with query-based representations for predicting segments of both object instances and scene layout. In Uni-3D, we also introduce a single Transformer for 2D depth-aware panoptic segmentation, which offers queries that serve as strong shape priors in 3D. Uni-3D seamlessly integrates 2D and 3D in its architecture and it outperforms previous methods significantly. * indicates equal contribution. Code: https://github.com/mlpc-ucsd/Uni-3D .