KDD2025
FlexGNN: A High-Performance, Large-Scale Full-Graph GNN System with Best-Effort Training Plan Optimization
Jeongmin Bae, Donghyoung Han, Min-Soo Kim
1 citation
Abstract
Recently, full-graph Graph Neural Networks (GNNs) have gained prominence by addressing complex problems such as weather forecasting and material discovery. Existing full-graph training methods do not fully manage intermediate data generated during training and rely on rigid inter-GPU communication, limiting both training speed and scale. We propose FlexGNN, which fully manages intermediate data and adaptively performs inter-GPU communication by generating and optimizing best-effort training execution plans. Extensive experiments demonstrate that FlexGNN significantly outperforms existing full-graph GNN methods in both training speed and scale. Specifically, it is up to 5.4X faster than HongTu and up to 95.5X faster than NeutronStar.