FSE2024

Bin2Summary: Beyond Function Name Prediction in Stripped Binaries with Functionality-Specific Code Embeddings

Zirui Song, Jiongyi Chen, Kehuan Zhang

被引用 2 次

摘要

Nowadays, closed-source software only with stripped binaries still dominates the ecosystem, which brings obstacles to understanding the functionalities of the software and further conducting the security analysis. With such an urgent need, research has traditionally focused on predicting function names, which can only provide fragmented and abbreviated information about functionality. To advance the state-of-the-art, this paper presents B in 2S ummary to automatically summarize the functionality of the function in stripped binaries with natural language sentences. Specifically, the proposed framework includes a functionality-specific code embedding module to facilitate fine-grained similarity detection and an attention-based seq2seq model to generate summaries in natural language. Based on 16 widely-used projects ( e.g ., Coreutils ), we have evaluated B in 2S ummary with <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn38</mml:mn> mml:mo,</mml:mo> mml:mn167</mml:mn> </mml:mrow> </mml:math> functions, which are filtered from <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn162</mml:mn> mml:mo,</mml:mo> mml:mn406</mml:mn> </mml:mrow> </mml:math> functions, and all of them have a high-quality comment. B in 2S ummary achieves <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn0.728</mml:mn> </mml:mrow> </mml:math> in precision and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn0.729</mml:mn> </mml:mrow> </mml:math> in recall on our datasets, and the functionality-specific embedding module can improve the existing assembly language model by up to <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn109.5</mml:mn> mml:mi%</mml:mi> </mml:mrow> </mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"> mml:mrow mml:mn109.9</mml:mn> mml:mi%</mml:mi> </mml:mrow> </mml:math> in precision and recall. Meanwhile, the experiments demonstrated that B in 2S ummary has outstanding transferability in analyzing the cross-architecture ( i.e ., in x64 and x86) and cross-environment ( i.e ., in Cygwin and MSYS2 ) binaries. Finally, the case study illustrates how B in 2S ummary outperforms the existing works in providing functionality summaries with abundant semantics beyond function names.