WWW2026

Enhancing Multi-Modal Entity Alignment via Multi-Grained Decision Fusion

Yu Xing, Qizhuo Xie, You Lv, Ziyang Zhou, Qianzi Hou, Qing Gu, Bin Chong, Tieke He

摘要

Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs), which play a crucial role in organizing and integrating web knowledge from diverse modalities. Although prior studies have made progress by multi-modal features fusion, three inherent limitations remain unresolved. First, instance-level feature fusion is misaligned with the pair-wise task format of MMEA. Second, joint representations often overlook modality-specific characteristics, resulting in insufficient alignment. Third, most existing methods rely solely on global features of modality. This may lead to the misalignment of entities that are similar yet distinct. To address above issues, we propose DMEA, a new decision-fusion-based framework. Specifically, we first design a multi-modal knowledge encoding module to extract both global and local features for different modalities and then introduce a multi-grained alignment module, which consists of two components: intra-modal alignment and cross-modal alignment. The former computes alignment scores between the global and local features of entity pairs within the same modality, while the latter leverages the complementarity across modalities to compute cross-modal alignment scores. Each score is regarded as an independent decision, and the final alignment judgment is made by integrating all decisions. Finally, we incorporate an intra-modal contrastive loss to obtain more discriminative embedding representations. DMEA achieves improvements of 13.2% and 14.8% in hit@1 over the state-of-the-art models on two benchmark datasets, FB15K-DB15K and FB15K-YAGO15K, respectively, validating the superiority of our framework.