WWW2026

Multi-Granularity Multi-Modal Knowledge Graph Representation Learning via Subgraph-Aware Adaptive Fusion and Hierarchical Relation Modeling

Peining Li, Meiyu Liang, Wei Huang, Junping Du, Zhe Xue, Guanhua Ye, Wu Liu, Lei Shi

Abstract

Multi-modal knowledge graphs (MMKGs) enrich traditional knowledge graphs by incorporating heterogeneous modalities such as textual descriptions and visual content, offering complementary semantic cues for knowledge reasoning. However, existing approaches often overlook the structural dependencies within each modality, apply static or coarse-grained fusion strategies, and insufficiently model relational semantics. We propose a Multi-Granularity Multi-Modal Knowledge Graph Representation Learning Method via Subgraph-aware Adaptive Fusion and Hierarchical Relation Modeling (SAFER ), which implement multi-modal knowledge representation through adaptive fusion of multi-granularity information such as multi-modal semantics, knowledge structures and relations. SAFER explicitly constructs modality-specific subgraphs and employs structure-aware graph attention networks to effectively capture intra-modal structural dependencies. We propose an adaptive multi-modal fusion mechanism, which aggregates modality-specific embeddings at the semantic level by dynamically assigning entity-specific modality weights. We design a two-stage multi-granularity knowledge relation modeling strategy, which utilizes a structure-aware multi-modal adaptive pre-fusion to preserve topological information and a relation-aware graph attention network (RGAT) post-fusion to encode relational semantics. Extensive experiments on several benchmark datasets demonstrate that the proposed SAFER significantly outperforms competitive baselines on link prediction and relation reasoning tasks.