NDSS2026

Achieving Zen: Combining Mathematical and Programmatic Deep Learning Model Representations for Attribution and Reuse

David Oygenblik, Dinko Dermendzhiev, Filippos Sofias, Mingxuan Yao, Haichuan Xu, Runze Zhang, Jeman Park, Amit Kumar Sikder, Brendan Saltaformaggio

被引用 1 次

DOI 出版方

摘要

Existing work aims to convert black-box DL models to white-box through methodologies such as operator loop analysis [14] , learning-based approaches [15] , symbolic analysis [14] , [15] , or memory forensics [16] . Such approaches successfully recover an operator's mathematical representation, such as tensors, weights, and biases, and identify basic layer types based on this representation. However, we find that these techniques directly ignore the DL model's code, making model-recovery, environment setup, and model re-execution/instrumentation impossible for models containing any customized code (as shown in Table V ). Even approaches [14] that attempt to export recovered artifacts in the universal ONNX format [17] would fail to enable model re-execution, as ONNX, while prebuilt with standard operators, strictly requires a set of functions for customized code or novel operator implementations to enable model execution. Ultimately, this means that the application of model-testing techniques that rely on instrumentation of the model's code are directly inapplicable. Prior research [5], [7] and our preliminary study ( §II-A) have shown that without access to the instrumentable code of the deployed DL model, prior approaches will fail to satisfy the assumptions necessary to enable white-box analysis. Applying white-box analysis tools [5], [7], [8], [18]-[20] to proprietary DL models faces several challenges. First, enabling model reuse for code instrumentation is essential but hindered by the difficulty of recreating the correct execution environment, as prior work directly ignores this. Second, environment setup relies on an investigator's knowledge of what model is being deployed, what dependencies it requires, and what framework it can be deployed in. To answer these questions, an investigator can attempt to attribute the model to identify the original base model, yet customization often obscures a model's lineage, making attribution difficult. Relying solely on layer names, weights, and model graph structure for comparison is insufficient, as significant variations exist even within model families (shown in §IV-B1). Third, robust attribution and reuse require recovering a "fingerprint" of the model, which is the code that implements the DL model. Without this fingerprint, the necessary context Abstract-Prior work has developed techniques capable of extracting deep learning (DL) models in universal formats from system memory or program binaries for security analysis. Unfortunately, such techniques ignore the recovery of the DL model's programmatic representation required for model reuse and any white-box analysis techniques. Addressing this, we propose a novel recovery methodology, and prototype ZEN, that automatically recovers the DL model programmatic representation complementing the recovery of the mathematical representation by prior work. ZEN identifies novel code in an unknown DL system relative to a base model and generates patches uch that the recovered DL model can be reused. We evaluated ZEN on 21 SOTA DL models, including models across the language and vision domains, such as Llama 3 and YoloV10. ZEN successfully attributed custom models to their base models with 100% accuracy, enabling model reuse.