ASE2024

TypeFSL: Type Prediction from Binaries via Inter-procedural Data-flow Analysis and Few-shot Learning

Zirui Song, Yutong Zhou, Shuaike Dong, Ke Zhang, Kehuan Zhang

被引用 3 次

摘要

Type recovery in stripped binaries is a critical and challenging task in reverse engineering, as it is the basis for many security applications (e.g., vulnerability detection). Traditional analysis methods are limited by software complexity and emerging types in real-world projects. To address these limitations, machine learning methods have been explored. However, the existing supervised learning approaches struggle with analyzing complicated and uncommon types due to the limited availability of samples. Additionally, none of the existing works can capture fine-grained and inter-procedural features in the binaries. In this paper, we present TypeFSL, a framework that addresses the challenge of imbalanced type distributions by incorporating few-shot learning and captures inter-procedural semantics through program slicing. Moreover, based on a dataset with 3,003,117 functions, TypeFSL achieves an average of 77.9% and 84.6% accuracy across all architecture and optimizations in 20-way 5-shot and 10-shot classification tasks. Our prototype outperforms existing techniques in prediction accuracy and obfuscation resistance. Finally, the case studies demonstrate how TypeFSL predicts uncommon and complicated types in practical analysis.