ISSTA2025

Bridging the Gaps between Graph Neural Networks and Data-Flow Analysis: The Closer, the Better

Qingchen Yu, Xin Liu, Qingguo Zhou, Chunming Wu

摘要

Recent advances in applying deep neural networks to programming tasks have achieved remarkable success in practice, prompting interest in exploring how well these models can perform traditional program analysis techniques. Data-flow analysis (DFA), a classic and well-established approach for analyzing programs, presents an opportunity to assess the capabilities of neural networks in this domain. Given the structural similarities between DFA and Graph Neural Networks (GNNs), we explore the extent to which GNNs can effectively model the DFA algorithm. Building on the concept of algorithmic alignment from Neural Algorithmic Reasoning (NAR), we identify two key challenges: the noninterference property of the bit-vectors used in DFA and the complex handling of external information at different stages of the algorithm. Addressing these gaps, we propose three GNN architectures — DFA-GNN − , DFA-GNN, and DFA-GNN + — that progressively align with the DFA algorithm. Our evaluations emphasize the generalization capacity of these models, particularly in scenarios where training occurs on smaller samples while testing on much larger inputs. Results demonstrate that GNNs with higher algorithmic alignment, such as DFA-GNN + , exhibit superior generalization and sample efficiency, accurately scaling to 10 times larger inputs with minimal training data. Notably, we show that GNNs trained with only input-output pairs can perform competitively with models trained using full execution trajectory supervision, a common practice in recent NAR studies. This finding highlights the efficiency and robustness of GNNs in reasoning tasks when algorithmically aligned with the target algorithm.