ICML2025

Covered Forest: Fine-grained generalization analysis of graph neural networks

Antonis Vasileiou, Ben Finkelshtein, Floris Geerts, Ron Levie, Christopher Morris

Abstract

The expressive power of message-passing graph neural networks (MPNNs) is reasonably well understood, primarily through combinatorial techniques from graph isomorphism testing. However, MPNNs' generalization abilities-making meaningful predictions beyond the training set-remain less explored. Current generalization analyses often overlook graph structure, limit the focus to specific aggregation functions, and assume the impractical, hard-to-optimize 0-1 loss function. Here, we extend recent advances in graph similarity theory to assess the influence of graph structure, aggregation, and loss functions on MPNNs' generalization abilities. Our empirical study supports our theoretical insights, improving our understanding of MPNNs' generalization properties. abilities, e.g. Franks et al. [2023 ], Liao et al. [2021] , Maskey et al. [2022] , Morris et al. [2023], Scarselli et al. [2018] , study MPNNs' generalization abilities via Vapnik-Chervonenkis dimension theory (VC dimension) [Vapnik and Chervonenkis, 1964, Vapnik, 1995] or related formalisms and only derive generalization bounds that depend on simplistic graph parameters, such as the maximum degree or number of vertices. While Morris et al. [2023, Proposition 1 and 2] derived a tight connection between the expressivity of the 1-WL and MPNNs' VC dimension, their analysis assumes a hard-to-optimize 0-1 loss function and specific aggregation functions. Most importantly, the analysis in Morris et al. [2023], due to their tight connection to the 1-WL, implicitly uses a naive notion of graph similarity, namely, two graphs are exactly close if they are equivalent under 1-WL and otherwise far apart, ultimately leading to vacuous bounds not relevant to practitioners. Moreover, most analyses do not account for various architectural choices, e.g., the aggregation function. Present work Here, we leverage and extend modern generalization frameworks based on (data-dependent) covering numbers [Xu and Mannor, 2012, Kawaguchi et al., 2022] to address the above shortcomings. That is, we investigate how leveraging refined notions of graph similarity, or more precisely pseudo-metrics on the set of graphs, allows smaller coverings of the set of graphs, leading to tighter analysis of MPNNs' generalization error. See Figure 1 for an illustration of how different pseudo-metrics induce different geometries, leading to a tighter generalization analysis. In addition, our analysis also accounts for different architectural choices and loss functions. Concretely, 1. we show that the choice of the pseudo-metric on the set of n-order graphs can significantly impact the generalization analysis of MPNNs. More precisely, we show that using the Tree distance [Böker, 2021] leads to a strictly tighter analysis of the generalization error compared to the results of Morris et al. [2023]. We provide a general proof technique to derive such an improvement. Unlike Morris et al. [2023], our analysis covers many loss functions, such as the cross-entropy loss and mean absolute error. 2. We establish a connection between our developed pseudo-metrics on the set of graphs and the Tree Mover's distance (TMD) [Chuang and Jegelka, 2022] , offering a deeper understanding of the TMD, which is initially defined through a recursive formula solving a transportation problem. 3. In addition, we define the 1-MWL, a heuristic for the graph isomorphism problem that accurately characterizes the expressivity of MPNNs using mean aggregation. Subsequently, we define a pseudo-metric on the set of graphs equivalent to 1-MWL in distinguishing non-isomorphic graphs. We further show that this pseudo-metric satisfies the Lipschitz property for mean aggregation MPNNs as required for our generalization analysis. 4. Empirically, we show that our theoretical findings translate to practice, leading to a more nuanced understanding when MPNNs generalize. Overall, our results show how using refined notions of graph similarity leads to a more detailed understanding of MPNNs' generalization abilities, taking into account graph structure, different architectural choices, and loss functions. Related work In the following, we discuss relevant related work.