WWW2026
IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation
Ruikun Luo, Zixiao Feng, Lin Gu, Xiaoyu Xia
Abstract
Multimodal Retrieval-Augmented Generation (MM-RAG) extends the capabilities of Large Language Models (LLMs) by incorporating external image-text knowledge bases to handle various tasks. However, MM-RAG systems in open environments are highly vulnerable to retrieval poisoning attacks, i.e., adversaries can inject malicious image-text pairs that are retrieved and dominate the generation process, leading to incorrect or harmful outputs. Due to the unique challenges of image-text fusion and cross-modal interference, existing defenses for text-based RAG cannot be directly applied to multimodal scenarios. In this paper, we propose IRAG, the first robust defense framework specifically designed for MM-RAG. The core of IRAG lies in its hazard separation. This structured defense isolates potential contamination sources by leveraging redundancy and consensus, enhancing system robustness and ensuring reliable outputs even when portions of retrieved content are compromised. Extensive experiments conducted under the MMQA and WebQA and the BQI and ROTI poisoning schemes demonstrate that IRAG consistently restores system reliability: the normal answer accuracy improves by 15–30% (restoring it to pre-poisoning levels), while the poisoned answer rate is reduced to below 7%.