ASE2024
RCFG2Vec: Considering Long-Distance Dependency for Binary Code Similarity Detection
Weilong Li, Jintian Lu, Ruizhi Xiao, Pengfei Shao, Shuyuan Jin
被引用 4 次
摘要
Binary code similarity detection(BCSD), as a fundamental technique in software security, has various applications, including malware family detection, known vulnerability detection and code plagiarism detection. Recent deep learning-based BCSD approaches have demonstrated promising performance. However, they face two significant challenges that limit detection performance. First, most approaches that use sequence networks (like RNN and Transformer) utilize coarse-grained tokenization methods, which results in large vocabulary size and severe out-of-vocabulary (OOV) problem. Second, CFG-based methods typically use variants of graph convolutional networks, which only consider local structural information and discard long-distance dependencies between basic blocks.