ASE2020
CCGraph: a PDG-based code clone detector with approximate graph matching
Yue Zou, Bihuan Ban, Yinxing Xue, Yun Xu
46 citations
Abstract
The software clone detection is an active research area, which is very important for software maintenance, bug detection etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDGbased tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools. CCS CONCEPTS • Software and its engineering → Software maintenance tools.