ASE2025

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, Deheng Ye

1 citation

Abstract

The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of highquality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. ( 2 ) Hard to verify the intermediate vulnerability repair process during LLM training. Existing reinforcement learning methods often leverage intermediate execution feedback from the environment (e.g., sandbox-based execution results) to guide reinforcement learning training. In contrast, the vulnerability repair process generally lacks such intermediate, verifiable feedback, which poses additional challenges for model training. To address these challenges, we propose to model the vulnerability repair task from a reasoning perspective and train a reasoning LLM termed Vulnerability Reasoner and Repair (Vul-R2) which consists of two key modules: (1) a domain-aware reasoning learning module, which comprises a reasoning answer construction component, a reasoning data filtering process, and a supervised fine-tuning process for learning vulnerability-related reasoning knowledge; and (2) a curriculum-based verifiable rewarded training module, which comprises dynamically reinforcement learning with verifiable rewards paradigms based on multiple-choice question answering in an easy stage and character-level matching in a hard stage. We evaluate Vul-R2 on the real-world C/C++ dataset PrimeVul to demonstrate its effectiveness in vulnerability repair. Specifically, Vul-R2 outperforms the best baseline by 11.27% for exact match (EM) and successfully repairs 49 additional vulnerabilities. Furthermore, we demonstrate the effectiveness of the proposed paradigm, finetuning Vul-R2 on PrimeVul leads to improved EM performance of 8.78% on a human curated dataset SVEN, even without additional training.