CVPR2025
PerLA: Perceptive 3D Language Assistant
Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, Yiming Wang
摘要
What is the object on the right side of a gray chair, on top of the table? What is this object? [bbox ] or . radiator black computer monitor SOTA PerLA this is a black suitcase. it is on the floor. the pillow is on the left side of the bed. the pillow is black. SOTA PerLA CLICK CLICK Figure 1. PerLA is a 3D language assistant that integrates local details with global context to learn informative representations of 3D scenes, whereas state-of-the-art (SOTA) 3DLAs focus solely on global context information. PerLA can provide more accurate responses, correctly distinguishing between objects such as a "black computer monitor" and a "black suitcase," where SOTA models instead fail with hallucinated responses. Examples in figures show cases where capturing details from the point cloud matters for accurate output captions.