AAAI2024

BertRLFuzzer: A BERT and Reinforcement Learning Based Fuzzer (Student Abstract)

Piyush Jha, Joseph Scott, Jaya Sriram Ganeshna, Mudit Singh, Vijay Ganesh

10 citations

Abstract

We present a novel tool BERTRLFUZZER, a BERT and Reinforcement Learning (RL) based fuzzer aimed at finding security vulnerabilities for Web applications. BERTRLFUZZER works as follows: given a list of seed inputs, the fuzzer performs grammar-adhering and attack-provoking mutation operations on them to generate candidate attack vectors. The key insight of BERTRLFUZZER is the combined use of two machine learning concepts. The first one is the use of semisupervised learning with language models (e.g., BERT) that enables BERTRLFUZZER to learn (relevant fragments of) the grammar of a victim application as well as attack patterns, without requiring the user to specify it explicitly. The second one is the use of RL with BERT model as an agent to guide the fuzzer to efficiently learn grammar-adhering and attackprovoking mutation operators. The RL-guided feedback loop enables BERTRLFUZZER to automatically search the space of attack vectors to exploit the weaknesses of the given victim application without the need to create labeled training data. Furthermore, these two features together enable BERTRL-FUZZER to be extensible, i.e., the user can extend BERTRL-FUZZER to a variety of victim applications and attack vectors automatically (i.e., without explicitly modifying the fuzzer or providing a grammar). In order to establish the efficacy of BERTRLFUZZER we compare it against a total of 13 black box and white box fuzzers: 7 machine learning-based black box fuzzers (Deep-SQLi, DeepFuzz, DQN fuzzer, modified versions of Deep-XSS, DeepFix, GRU-PPO, Multi-head DQN), 3 grammarpreserving fuzzer (BIOFuzz, SQLMap, baseline mutator), a white box fuzzer Ardilla, a random mutator, and a random fuzzer, over a benchmark of 9 victim websites. We observed a significant improvement in terms of time to first attack (54% less than the nearest competing tool), time to find all vulnerabilities (40-60% less than the nearest competing tool), and attack rate (4.4% more attack vectors generated than the nearest competing tool). Our experiments show that the combination of the BERT model and RL-based learning makes BERTRL-FUZZER an effective, adaptive, easy-to-use, automatic, and extensible fuzzer.