CVPR2025

Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation

Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Christopher B. Choy

摘要

Figure 1. Mosaic3D-5.6M. Mosaic3D-5.6M is a large-scale dataset generated from a collection of existing datasets [7, 14, 24, 99, 107], consisting of 5.6M mask-text pairs, providing fine-grained masks (black outline in the figure) and detailed captions (text with matching color) pairs. Using this large-scale dataset, we propose Mosaic3D, a foundation model for open-vocabulary 3D segmentation.