CVPR2025

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, Qianying Wang, Ping Chen, Xiaoqin Zhang, Shijian Lu

摘要

Despite the superb performance in mitigating hallucinations and enhancing the general perception capabilities of LVLMs, our work could be improved in several aspects. First, we conducted experiments on the most widely used LVLMs due to resource constraints. It will be useful to evaluate our model on larger LVLMs such as LLaVA 34B and Flamingo 70B [4]. In addition, this work focuses on text and image data. It could be extended to data from other modalities such as videos. We will examine these problems in our future work.