NDSS2016
discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code
Sebastian Eschweiler, Khaled Yakdan, Elmar Gerhards-Padilla
被引用 342 次
摘要
The identification of security-critical vulnerabilities is a key for protecting computer systems. Being able to perform this process at the binary level is very important given that many software projects are closed-source. Even if the source code is available, compilation may create a mismatch between the source code and the binary code that is executed by the processor, causing analyses that are performed on source code to fail at detecting certain bugs and thus potential vulnerabilities. Existing approaches to find bugs in binary code 1) use dynamic analysis, which is difficult for firmware; 2) handle only a single architecture; or 3) use semantic similarity, which is very slow when analyzing large code bases. In this paper, we present a new approach to efficiently search for similar functions in binary code. We use this method to identify known bugs in binaries as follows: starting with a vulnerable binary function, we identify similar functions in other binaries across different compilers, optimization levels, operating systems, and CPU architectures. The main idea is to compute similarity between functions based on the structure of the corresponding control flow graphs. To minimize this costly computation, we employ an efficient pre-filter based on numeric features to quickly identify a small set of candidate functions. This allows us to efficiently search for similar functions in large code bases. We have designed and implemented a prototype of our approach, called discovRE, that supports four instruction set architectures (x86, x64, ARM, MIPS). We show that discovRE is four orders of magnitude faster than the state-of-the-art academic approach for cross-architecture bug search in binaries. We also show that we can identify Heartbleed and POODLE vulnerabilities in an Android system image that contains over 130,000 native ARM functions in about 80 milliseconds. Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author's employer if the paper was prepared within the scope of employment.