NDSS2023

BinaryInferno: A Semantic-Driven Approach to Field Inference for Binary Message Formats

Jared Chandler, Adam Wick, Kathleen Fisher

Abstract

—We present B inary I nferno , a fully automatic tool for reverse engineering binary message formats. Given a set of messages with the same format, the tool uses an ensemble of detectors to infer a collection of partial descriptions and then automatically integrates the partial descriptions into a semantically-meaningful description that can be used to parse future packets with the same format. As its ensemble, B inary I nferno uses a modular and extensible set of targeted detectors, including detectors for identifying atomic data types such as IEEE floats, timestamps, and integer length fields; for finding boundaries between adjacent fields using Shannon entropy; and for discovering variable-length sequences by searching for common serialization idioms. We evaluate B inary I nferno ’s performance on sets of packets drawn from 10 binary protocols. Our semantic-driven approach significantly decreases false positive rates and increases precision when compared to the previous state of the art. For top-level protocols we identify field boundaries with an average precision of 0.69, an average recall of 0.73, and an average false positive rate of 0.04, significantly outperforming five other state-of-the-art protocol reverse engineering tools on the same data sets: A wre (0.18, 0.03, 0.04), F ield H unter (0.68, 0.37, 0.01), N emesys (0.31, 0.44, 0.11), N etplier (0.29, 0.75, 0.22), and N etzob (0.57, 0.42, 0.03). We believe our improvements in precision and false positive rates represent what our target user most wants: semantically meaningful descriptions with fewer false positives.