ACL2025

ESF: Efficient Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models

Xiaofan Bai, Pingyi Hu, Xiaojing Ma, Linchen Yu, Dongmei Zhang, Qi Zhang, Bin Benjamin Zhu

摘要

The rapid adoption of large language models (LLMs) in diverse applications has intensified concerns over their security and integrity, particularly in cloud environments where users cannot access internal model parameters. One critical threat is model tampering, which can compromise LLM behavior and reliability. However, traditional tamper detection methods, designed for deterministic classification models, are inadequate for LLMs due to their output randomness and massive parameter spaces. In this paper, we introduce Efficient Sensitive Fingerprinting (ESF), the first fingerprinting method tailored for black-box tamper detection of LLMs. ESF generates fingerprint samples by optimizing output sensitivity at selected detection token positions and leverages Randomness-Set Consistency Checking (RSCC) to accommodate inherent output randomness. Additionally, we propose a novel Max Coverage Strategy (MCS) to select an optimal set of fingerprint samples that maximizes joint sensitivity to tampering. Grounded in a rigorous theoretical framework, ESF is computationally efficient and scalable to large models. Extensive experiments on state-of-the-art LLMs demonstrate that ESF reliably detects tampering-including fine-tuning, model compression, and backdoor injection-with detection rates of at least 99.2% using only 5 fingerprint samples, offering a robust solution for securing cloud-based AI systems.