CVPR2025

Functionality Understanding and Segmentation in 3D Scenes

Jaime Corsetti, Francesco Giuliari, Alice Fasoli, Davide Boscaini, Fabio Poiesi

Abstract

Turn on the ceiling light using the switch next to the TV Open the top right drawer of the cabinet with the TV on top Fun3DU #4f1787ff #eb3678ff #fb773cff Open the second drawer of the cabinet to the left of the TV #bae1ff World Knowledge & Vision Perception Figure 1 . We present Fun3DU, the first method for functionality understanding and segmentation in 3D scenes. Fun3DU interprets natural language descriptions (left-hand side) in order to segment functional objects in real-world 3D environments (right-hand side). Fun3DU relies on world knowledge and vision perception capabilities of pre-trained vision and language models, without requiring task-specific finetuning.