ASE2025

GRACG: Graph Retrieval Augmented Code Generation

Konstantin Fedorov, Boris Zarubin, Vladimir Ivanov

2 citations

Abstract

While file-level context is an effective basis for many modern code generation tools powered by Large Language Models (LLMs), it may be insufficient to fully capture the structural and semantic dependencies that extend across entire repositories. To address this issue, we propose a graph-based Retrieval-Augmented Code Generation (GRACG) framework that leverages repository-level context. Our approach models the repository as a heterogeneous graph of files, classes, and functions. A Graph Neural Network (GNN) is used to generate node embeddings that incorporate information from connected nodes. These embeddings are precomputed and serve as an efficient index, allowing us to retrieve context for user queries without re-running the GNN. Retrieved nodes are then used to construct prompts for LLMs to perform repository-aware function generation. We evaluate retrieval performance on a benchmark where the goal is to identify functions likely to be called based on a natural language description. For end-to-end evaluation, we use pass@k, which measures the percentage of tasks where at least one of the top-k generated solutions passes the associated test cases. Our results show that graphbased retrieval outperforms classical methods, highlighting it as a promising direction for future research. However, in terms of end-to-end code generation, we observe non-significant improvements in the metrics, even when oracle functions are used. Importantly, the modular nature of our framework also makes it suitable as a building block in multi-agent settings, where specialized components for retrieval, generation, and verification can collaborate around repository-level knowledge.