WWW2026

An 8-Way Taxonomy for Multimodal Disinformation and Detection Benchmark

Shuhan Cui, Ruimin Chu, Hanrui Wang, Patrick H. Chen, Ching-Chun Chang, Isao Echizen

摘要

Multimodal disinformation may arise from the image, the text, or their cross-modal consistency, and these aspects can mislead either alone or in combination. However, existing research has not considered the full range of such combinations, leaving current methods vulnerable to unseen combinations. To address this gap, we propose a tri-axis formulation that explicitly separates image veracity, text veracity, and cross-modal consistency, yielding an 8-way space that fully enumerates all cases and turns multimodal disinformation detection, formerly a coarse, multi-label problem, into a well-defined, mutually exclusive classification task. To support this formulation, we construct OctantFake, an 8-way dataset sourced from 16 datasets with complete label coverage and broad class diversity. Experiments show that existing methods struggle on this challenging dataset. Building on this dataset, we introduce OctantAgent, an LVLM-based parallel framework with three modules, image check, text check, and consistency check, whose predictions are fused into an 8-way decision, achieving clear improvements over existing LVLM-based methods. Together, these contributions introduce an interpretable and exhaustive 8-way perspective that provides a solid foundation for analyzing multimodal disinformation.