ICLR2026

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Tao Huang, Rui Wang, Xiaofei Liu, Yi Qin, Li Duan, Liping Jing

被引用 1 次

摘要

Large vision-language models (LVLMs) have achieved substantial advances in multimodal understanding. However, when presented with blackchallenging or distribution-shifted inputs, they frequently produce unreliable or even harmful content, blacksuch as hallucinations or toxic responses. We refer to such misalignments with human expectations as misbehaviors of LVLMs, which raise serious concerns for their deployment in critical applications. blackExisting research have disclosed that such misbehaviors are closely linked to model uncertainty. We find they primarily stem from two distinct sources of epistemic uncertainty: internal contradictions (conflict) and the absence of supporting information (ignorance). While existing uncertainty quantification methods typically capture only total predictive uncertainty, they struggle to distinguish between these underlying causes. To address this gap, we propose Evidential Uncertainty Quantification (EUQ), blacka training-free framework that explicitly decomposes epistemic uncertainty into conflict (CF) and ignorance (IG). Specifically, we interpret features from the model output head as either supporting (positive) or opposing (negative) evidence. Leveraging Dempster-Shafer Theory of belief functions, we aggregate this evidence to quantify internal conflict and knowledge gaps within a single forward pass. We extensively evaluate EUQ across four misbehavior categories, including hallucinations, jailbreaks, adversarial vulnerabilities, and out-of-distribution (OOD) failures using state-of-the-art LVLMs. Experimental results demonstrate that EUQ consistently outperforms strong baselines, blackachieving relative improvements of up to 10.5% in AUROC. blackOur evaluation further reveals that hallucinations correspond to high internal conflict and OOD failures to high ignorance. blackFurthermore, a layer-wise evidential uncertainty dynamics analysis provides a novel perspective on the evolution of internal representations. The source code is available at https://github.com/HT86159/EUQ.