NDSS2018

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, Yuyi Zhong

被引用 318 次

摘要

Fine-grained software vulnerability detection is an important and challenging problem. Ideally, a detection system (or detector) not only should be able to detect whether or not a program contains vulnerabilities, but also should be able to pinpoint the type of a vulnerability in question. Existing vulnerability detection methods based on deep learning can detect the presence of vulnerabilities (i.e., addressing the binary classification or detection problem), but cannot pinpoint types of vulnerabilities (i.e., incapable of addressing multiclass classification). In this paper, we propose the first deep learning-based system for multiclass vulnerability detection, dubbed <inline-formula><tex-math notation="LaTeX">μ\mu</tex-math><alternatives>mml:mathmml:miμ</mml:mi></mml:math><inline-graphic xlink:href="zou-ieq2-2942930.gif"/></alternatives></inline-formula>VulDeePecker. The key insight underlying <inline-formula><tex-math notation="LaTeX">μ\mu</tex-math><alternatives>mml:mathmml:miμ</mml:mi></mml:math><inline-graphic xlink:href="zou-ieq3-2942930.gif"/></alternatives></inline-formula>VulDeePecker is the concept of <italic>code attention</italic>, which can capture information that can help pinpoint types of vulnerabilities, even when the samples are small. For this purpose, we create a dataset from scratch and use it to evaluate the effectiveness of <inline-formula><tex-math notation="LaTeX">μ\mu</tex-math><alternatives>mml:mathmml:miμ</mml:mi></mml:math><inline-graphic xlink:href="zou-ieq4-2942930.gif"/></alternatives></inline-formula>VulDeePecker. Experimental results show that <inline-formula><tex-math notation="LaTeX">μ\mu</tex-math><alternatives>mml:mathmml:miμ</mml:mi></mml:math><inline-graphic xlink:href="zou-ieq5-2942930.gif"/></alternatives></inline-formula>VulDeePecker is effective for multiclass vulnerability detection and that accommodating control-dependence (other than data-dependence) can lead to higher detection capabilities.