ASE2025

Not Every Patch is an Island: LLM-Enhanced Identification of Multiple Vulnerability Patches

Yi Song, Dongchen Xie, Lin Xu, He Zhang, Chunying Zhou, Xiaoyuan Xie

摘要

For a vulnerability reported as an item of platforms such as CVE or NVD, software maintainers need to submit patches (in the form of code commit) to fix it, which is often performed silently for the sake of keeping products’ reputation or avoiding malicious attacks. But such a silent practice keeps patches hidden from affected downstream software maintainers, thus they have to identify patches in a large corpus of code commits manually, i.e., silent vulnerability patch identification (SVPI). Existing techniques in this field were often developed under the assumption that a vulnerability is matched to one patch, thus output a ranking list that simply reflects the similarity between one individual patch and the vulnerability. However, previous research has demonstrated that many vulnerabilities correspond to more than one patch in practice, this phenomenon largely threatens the effectiveness of existing SVPI techniques because they typically ignore the correlation between patches. In this paper, we propose SHIP, a Silent vulnerability patcH Identification approach suited for multiPle-patch scenarios, to make patches corresponding to a vulnerability no longer isolated islands. For a vulnerability item, we first obtain several highly-relevant code commits by measuring heuristic features, and then employ a large language model (i.e., DeepSeek-V3) to predict both the link between a code commit and the vulnerability as well as the link between a pair of code commits, and thus deliver candidate groups each containing one or more code commits that could be patches of the vulnerability. Finally, we perform the max-pooling strategy on the features of code commit(s) contained in each candidate group to determine the ranking of groups, the Top-1 group will be output. The experimental results demonstrate the promise of SHIP: on the benchmark consisting of 4,631 vulnerability items, it can achieve 84.30%, 59.14%, and 69.51% of Recall, Precision, and F1-Score, respectively, outperforming the state-of-the-art SVPI technique by 37.54%, 28.71%, and 32.35%, respectively.