WWW2026
Dir-GD: Directed Graph Distillation
Ce Yang, Fei Hao, Jie Gao, Jianrui Chen, Jia Hu, Geyong Min
Abstract
Graph-structured data effectively captures complex relationships in diverse domains such as social networks, financial transactions, citation networks, and recommendation systems. Graph Neural Networks (GNNs) excel in learning intricate topological patterns, yielding strong performance on tasks like node classification and link prediction. However, real-world graphs often scale to millions of nodes and billions of directed edges, posing significant computational and storage challenges for GNN training that frequently exceed available hardware limits. Although graph sampling and distillation techniques alleviate these issues by subsampling or creating surrogate graphs, they primarily handle undirected graphs, neglecting directional semantics that are crucial for applications like fraud detection and causal analysis. To address these limitations, we introduce the Directed Graph Distillation (Dir-GD) framework, which combines distributed learning with community detection to divide large directed graphs into independent subgraphs for distributed directed GNN training. This process culminates in parameter aggregation to produce a compact global synthetic graph that preserves essential topology and directionality. Extensive experiments on large-scale datasets, such as the million-node soc-pokec-relationships, demonstrate over 91% accuracy at 0.001 distillation ratios, accompanied by substantial memory and runtime savings. This work pioneers directed graph distillation as a key paradigm for analyzing ultra-large directed graphs, offering a scalable solution that maintains high fidelity in compressed representations.