NDSS2026

vSim: Semantics-Aware Value Extraction for Efficient Binary Code Similarity Analysis

Huaijin Wang, Zhiqiang Lin

1 citation

Abstract

Binary Code Similarity Analysis (BCSA) plays a vital role in many security tasks, including malware analysis, vulnerability detection, and software supply chain security. While numerous BCSA techniques have been proposed over the past decade, few leverage the semantics of register and memory values for comparison, despite promising initial results. Existing value-based approaches often focus narrowly on values that remain invariant across compilation settings, thereby overlooking a broader spectrum of semantically rich information. In this paper, we identify three core challenges limiting the effectiveness of value-based BCSA: (1) unscalable value extraction that fails to cover diverse value-producing behaviors, (2) insufficient noise filtering that allows semantically irrelevant artifacts (e.g., global addresses) to dominate, and (3) inefficient comparison that makes value-based matching expensive and brittle. To make value-based BCSA practical at scale, we propose VSIM, a novel framework that systematically captures values computed from register and memory operations, filters out semantically irrelevant values (e.g., global addresses), and normalizes and propagates the remaining values to enable robust and scalable similarity analysis. Extensive evaluation shows that VSIM consistently outperforms state-of-theart BCSA systems in accuracy, robustness, and scalability, and generalizes across architectures and toolchains, delivering reliable results on diverse real-world datasets. semantics from raw bytes [21] or assembly code [4], [5], [19], (2) comparing binary code based on input-output (I/O) equivalence [7], [22], [23], (3) analyzing program states postexecution [8], [12], [24], and (4) examining specific invariant values [3], [25]. These methods typically rely on either traditional program analysis techniques or machine learning (ML), each inheriting distinct limitations such as poor scalability [26], limited code coverage [27], [28], and inability to generalize well to out-of-distribution (OOD) samples [29], [30]. Meanwhile, in practice, many security tasks, particularly those related to software supply chains [14], [16], [31], [32], require analyzing massive binary corpora efficiently. Consequently, recent BCSA efforts increasingly adopt MLbased embedding approaches for rapid semantic comparisons. Yet, such ML models frequently encounter robustness issues due to diverse optimizations and prevalent OOD binaries, leading to degraded accuracy [5], [33] . Motivated by these limitations, we propose a non-ML approach that extracts robust, semantics-aware values to approximate binary code semantics. More specifically, we propose VSIM, a novel value-based BCSA framework for accurate, robust, and scalable analysis. Unlike previous approaches that narrowly focus on source-level semantic values (e.g., return results) [3], [23], [25], [34], VSIM systematically captures and utilizes the intermediate register and memory values encountered during under-constrained symbolic execution [26]. These intermediate values offer crucial semantic information often ignored by prior techniques. VSIM further addresses three key challenges inherent to value-based BCSA: unscalable value extraction, semantics-aware value selection, and inefficient value comparison, demonstrating the feasibility and effectiveness of leveraging values for BCSA. VSIM's workflow consists of four key steps: (1) It employs a customized under-constrained symbolic execution engine to scalably extract register and memory values, capturing essential semantic details at the basic-block level. (2) It applies six heuristics, inspired by common disassembly practices [35]-[37], to retain semantics-aware values and discard irrelevant ones (e.g., global variable addresses). (3) The semantics-aware values are then normalized and concretized to construct function fingerprints, enabling efficient similarity comparison. (4) These fingerprints are further enhanced by propagating callees' fingerprints to callers and incorporating the distinguishability of captured values, significantly boosting robustness and accuracy. To evaluate VSIM, we perform extensive experiments across three large datasets covering diverse compilation scenarios.