ICLR2025
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Zhongxiang Sun, Xiaoxue Zang, Kai Zheng, Jun Xu, Xiao Zhang, Weijie Yu, Yang Song, Han Li
摘要
Attention modules are more 'truthful' than other modules in LLMs (e.g., FFN modules). ➢ Knowledge is mainly stored in the FFN module of the transformer layer in pre-trained language model [1]. ➢ Even if the self-attention module correctly focuses on the relevant token, the FFN module may still produce factuality hallucinations due to insufficient pre-training [2].