CVPR2025

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari

摘要

Figure 1 . Spatial reasoning in 3D is challenging as it requires multiple steps of grounding and inference. We introduce a benchmark for 3D understanding with complex queries; an example is shown here. To tackle these queries we propose a training-free agentic approach, VADAR, that dynamically generates new skills in Python and thus can handle a wider range of queries compared to prior methods.