ICLR2025

Bonsai: Gradient-free Graph Condensation for Node Classification

Mridul Gupta, Samyak Jain, Vansh Ramani, Hariprasad Kodamana, Sayan Ranu

Abstract

Graph condensation has emerged as a promising avenue to enable scalable training of Gnns by compressing the training dataset while preserving essential graph characteristics. Our study uncovers significant shortcomings in current graph condensation techniques. First, the majority of the algorithms paradoxically require training on the full dataset to perform condensation. Second, due to their gradient-emulating approach, these methods require fresh condensation for any change in hyper-parameters or Gnn architecture, limiting their flexibility and reusability. To address these challenges, we present Bonsai, a novel graph condensation method empowered by the observation that computation trees form the fundamental processing units of message-passing Gnns. Bonsai condenses datasets by encoding a careful selection of exemplar trees that maximize the representation of all computation trees in the training set. This unique approach imparts Bonsai as the first linear-time, model-agnostic graph condensation algorithm for node classification that outperforms existing baselines across 7 real-world datasets on accuracy, while being 22 times faster on average. Bonsai is grounded in rigorous mathematical guarantees on the adopted approximation strategies, making it robust to Gnn architectures, datasets, and parameters. *Denotes equal contribution. ¹Some algorithms sparsify the fully-connected graph based on edge weights. But this sparsification process requires training on the fully connected graph itself to identify the pruning threshold. ²Inspired by the art of Bonsai, which transforms large trees into miniature forms while preserving their essence, our graph condensation algorithm gracefully prunes redundant computation trees, creating a condensed graph that is significantly smaller yet maintains comparable performance.