EMNLP2025

SHARP: Steering Hallucination in LVLMs via Representation Engineering

Junfei Wu, Yue Ding, Guofan Liu, Tianze Xia, Ziyue Huang, Dianbo Sui, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan

Abstract

Despite their impressive capabilities, Large Vision-Language Models (LVLMs) frequently generate plausible yet incorrect or unsupported responses, referred to as hallucinations. In this study, we investigate whether different types of hallucinations are reflected in the model's internal representations by probing their encoded features. We focus on two causes of hallucination in multimodal reasoning-(1) overreliance on textual priors and (2) preference for user prompts over conflicting visual evidence-which have been identified in prior work as frequent and impactful factors. Our probing results reveals that hallucinations exhibit distinguishable representational patterns, suggesting a representation-level approach to characterize and mitigate them. Motivated by this, we propose Steering HAllucination via RePresentation Engineering (SHARP), a representation-level intervention framework that modulates hallucination-related features during inference. SHARP identifies functional representations responsible for prior-driven and visual-context conflicts, and jointly adjusts the model's internal activations during inference. We evaluate our approach extensively using three large vision-language models across various benchmarks. Experimental results show that our proposed intervention effectively reduces hallucinations without compromising the performance and generalization of the LVLMs.