CVPR2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang, Zhiding Yu

摘要

Room is filled with stuffed animal toys. closest silver SUV. man carrying bag of bottles. <box_1> <box_2> Liquid color in the glass on the table? <box> → ctx-token:color is green Figure 1. Visual question answering, grounding, and chain-of-thought reasoning with Argus. "ctx-token" is short for context token.