ICSE2025

Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis

Shiyu Sun, Yanhui Li, Lin Chen, Yuming Zhou, Jianhua Zhao

被引用 1 次

摘要

Code-line-Ievel defect prediction (CLDP) is an effective technique to incorporate comprehensive measures for buggy line identification to optimize efforts in Software Quality Assurance activities. Most CLDP methods either consider the textual information of the code or rely merely on file-level label information, which have not fully leveraged the essential information in the CLDP context, with historical code-line-level labels being incredibly overlooked in their application. Due to the vast number of code lines and the sparsity of the tokens they contain, leveraging historical code-line-level label information remains a significant challenge. To address this issue, we propose a novel CLDP method, Spectrum infOrmation and caUsality aNalysis based coDe-line-level defect prediction (SOUND). SOUND incorporates two key ideas: (a) it introduces a spectrum information perspective, utilizing labels from historical defective lines to quantify the contribution of tokens to line-level defects, and (b) it applies causal analysis to obtain a more systematic and comprehensive understanding of the causal relationships between tokens and defects. After conducting a comprehensive study involving 142 releases across 19 software projects, the experimental results demonstrate that our method significantly outperforms existing state-of-the-art (SOTA) CLDP baseline methods in terms of its ability to rank defective lines under three indicators, IFA, Recall@Top20%LOC, and Effort@Top20%Recall. Notably, in terms of IFA, our method achieves a score of 0 in most cases, indicating that the first line in the ranking list generated by our method is actually defective, significantly enhancing its practicality.