ASE2025
Breaking the Traffic Barrier: Unveiling Multi-Format of Protocols via Autonomous Program Exploration
Dingzhao Xue, Yibo Qu, Bowen Jiang, Xin Chen, Shuaizong Si, Shichao Lv, Zhiqiang Shi, Limin Sun
Abstract
Protocol reverse engineering (PRE) aims to infer the protocol formats of unknown protocols. Existing techniques, whether Network-Trace based or Execution-Trace based methods, face two main limitations: a reliance on the quality and scale of traffic datasets, which often leads to low accuracy and poor generalization; and a failure to adequately consider the multi-format characteristic prevalent in real-world protocols (i.e., the same protocol may support multiple different formats).To address these challenges, we propose ProbePRE—a PRE tool that performs multi-format extraction on protocol handlers by autonomously generating packets. ProbePRE employs three key techniques: (1) an execution tracing strategy enhanced with implicit data flow analysis to obtain more detailed execution information; (2) constraint extraction methods tailored for different program structures to pass protocol validation; and (3) an innovative constraint combination algorithm to construct effective packets that guide the protocol handler to execute diverse protocol parsing paths. In our experimental evaluation, we compared ProbePRE with 4 state-of-the-art PRE tools in terms of field segmentation accuracy. The results demonstrated that ProbePRE achieved an F1 score of 0.88, significantly outperforming existing methods. Furthermore, evaluations on 6 protocol handlers indicated that ProbePRE attained 83% completeness in multi-format extraction tasks. Notably, in basic block coverage tests, ProbePRE achieved a 67% improvement over traditional traffic dataset methods, which fully validates the effectiveness of its path exploration capabilities.