WWW2026

LLMQuA: Practical Backdoor Injection on Large Language Model Quantization

Xiangxiang Chen, Peixin Zhang, Jun Sun, Jin Song Dong, Wenhai Wang, Jingyi Wang

Abstract

Quantization is widely used to enable local deployment of large language models (LLMs) on resource-constrained devices. Recent work (e.g., QuRA) shows quantization can be exploited via rounding manipulation to implant backdoors. However, such an attack has been evaluated only on small models and does not directly apply to LLMs due to three key constraints: (1) limited poisoning data from small, task-agnostic calibration sets; (2) layer-wise quantization restricting adversarial access to global representations; and (3) lack of gradient access in quantization pipelines, blocking gradient-based attacks. We propose LLMQuA, a practical quantization-phase backdoor attack tailored to the LLM setting. LLMQuA (i) injects backdoors via data-efficient knowledge editing with few source–target token pairs, (ii) optimizes quantization parameters layer-locally to preserve activation distributions, and (iii) operates without gradient access by directly distorting quantized weights to reprogram token semantics. We evaluate LLMQuA on representative LLMs and two important attack scenarios: evasion of content moderation and causing systematic refusal of benign user queries. Empirically, LLMQuA reduces model moderation accuracy by up to 67.91% on correctly classified samples and induces refusal-to-answer behavior for over 90% of targeted queries, while degrading overall model utility by only a negligible margin on average. Finally, when tested against a range of deployment-phase defenses, many defenses fail to reliably detect LLMQuA or require substantially increased computational or operational costs to mitigate it. These results expose a practical, low-overhead supply-chain threat to quantized LLM deployments and motivate the need for deployment-aware integrity checks in LLM quantization workflows.