FSE2024

Towards Better Graph Neural Network-Based Fault Localization through Enhanced Code Representation

Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun (Peter) Chen, Shaowei Wang

被引用 15 次

摘要

Automatic software fault localization plays an important role in software quality assurance by pinpointing faulty locations for easier debugging. Coverage-based fault localization is a commonly used technique, which applies statistics on coverage spectra to rank faulty code based on suspiciousness scores. However, statisticsbased approaches based on formulae are often rigid, which calls for learning-based techniques. Amongst all, Grace , a graph-neural network (GNN) based technique has achieved state-of-the-art due to its capacity to preserve coverage spectra, i.e., test-to-source coverage relationships, as precise abstract syntax-enhanced graph representation, mitigating the limitation of other learning-based technique which compresses the feature representation. However, such representation is not scalable due to the increasing complexity of software, correlating with increasing coverage spectra and AST graph, making it challenging to extend, let alone train the graph neural network in practice. In this work, we proposed a new graph representation, DepGraph , that reduces the complexity of the graph representation by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn70</mml:mn> mml:mo%</mml:mo> </mml:math> in nodes and edges by integrating the interprocedural call graph in the graph representation of the code. Moreover, we integrate additional features—code change information—into the graph as attributes so the model can leverage rich historical project data. We evaluate DepGraph using Defects4j 2.0.0, and it outperforms Grace by locating <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn20</mml:mn> mml:mo%</mml:mo> </mml:math> more faults in Top-1 and improving the Mean First Rank (MFR) and the Mean Average Rank (MAR) by over <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn50</mml:mn> mml:mo%</mml:mo> </mml:math> while decreasing GPU memory usage by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn44</mml:mn> mml:mo%</mml:mo> </mml:math> and training/inference time by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn85</mml:mn> mml:mo%</mml:mo> </mml:math> . Additionally, in cross-project settings, DepGraph surpasses the state-of-the-art baseline with a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn42</mml:mn> mml:mo%</mml:mo> </mml:math> higher Top-1 accuracy, and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn68</mml:mn> mml:mo%</mml:mo> </mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mn65</mml:mn> mml:mo%</mml:mo> </mml:math> improvement in MFR and MAR, respectively. Our study demonstrates DepGraph ’s robustness, achieving state-of-the-art accuracy and scalability for future extension and adoption.