ASE2024

RCFG2Vec: Considering Long-Distance Dependency for Binary Code Similarity Detection

Weilong Li, Jintian Lu, Ruizhi Xiao, Pengfei Shao, Shuyuan Jin

被引用 4 次

摘要

Binary code similarity detection(BCSD), as a fundamental technique in software security, has various applications, including malware family detection, known vulnerability detection and code plagiarism detection. Recent deep learning-based BCSD approaches have demonstrated promising performance. However, they face two significant challenges that limit detection performance. First, most approaches that use sequence networks (like RNN and Transformer) utilize coarse-grained tokenization methods, which results in large vocabulary size and severe out-of-vocabulary (OOV) problem. Second, CFG-based methods typically use variants of graph convolutional networks, which only consider local structural information and discard long-distance dependencies between basic blocks.