ICLR2026
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Jun Chen, Ziyue Qiao, Qin Zhang, Kaize Ding, Xiao Luo
摘要
The pretrain-finetuning paradigm has achieved notable success in graph learning. Moreover, merging models fine-tuned on different tasks to enable a parameterefficient model with multi-task capabilities is gaining increasing attention for its practicality. However, existing model merging methods, such as weight averaging and task arithmetic, struggle to generalize well to graph structures and Graph Neural Network (GNN) models due to the unique structural heterogeneity of graph data. In this paper, we propose an innovative graph model merging framework called G-Merging for merging multiple task-specific fine-tuned GNN models. G-Merging first employs task arithmetic to coarsely merge graph models, capturing shared cross-task knowledge. Second, it introduces a Topology-aware Wasserstein Distance (TWD) loss to train lightweight task adapters upon the merged model, preserving domain-specific graph patterns via aligning the embeddings of merged and fine-tuned models. Third, G-Merging integrates the adapters into a training-free, topology-aware router within a mixture-of-experts (MoE) architecture, dynamically routing input graphs to task-specific adapters based on structural similarity, thereby mitigating conflicts and enhancing knowledge sharing. Extensive experiments on 8 graph downstream datasets demonstrate the effectiveness of the merged model, showing impressive performance close to or exceeding individual finetuned models while improving parameters and training efficiency. Our code is available at https://github.com/cjcj46262/G-Merging . ➊ This paper proposes G-Merging, a novel approach to merging fine-tuned graph models via task arithmetic and TWD-based adapter routing, resolving cross-domain structural heterogeneity and consolidating task-specific knowledge while enabling cross-task knowledge sharing. ➋ This paper proposes a topology-aware and training-free MoE that dynamically selects adapters at inference, enabling efficient cross-task knowledge transfer and multi-task generalization. ➌ Extensive experiments demonstrate that G-Merging not only maintains or exceeds the performance of individual fine-tuned models but also improves storage and training efficiency. The framework is also model-agnostic, supporting integration with various graph models.