ASE2025

Backdoors in Code Summarizers: How Bad Is It?

Chenyu Wang, Zhou Yang, Yaniv Harel, David Lo

摘要

Large Language Models for Code (Code LLMs) are increasingly employed in software development. However, studies have recently shown that these models are vulnerable to backdoor attacks: when a trigger (a specific input pattern) appears in the input, the backdoor will be activated and cause the model to generate malicious outputs desired by the attacker. Researchers have designed various triggers and demonstrated the feasibility of implanting backdoors by poisoning a fraction of the training data (known as data poisoning). Some basic conclusions have been made, such as backdoors becoming easier to implant when attackers modify more training data. However, existing research has not explored other factors influencing backdoor attacks on Code LLMs, such as training batch size, epoch number, and the broader design space for triggers, e.g., trigger length. To bridge this gap, we use the code summarization task as an example to perform a comprehensive empirical study that systematically investigates the factors affecting backdoor effectiveness and understands the extent of the threat posed by backdoor attacks on Code LLMs. Three categories of factors are considered: data, model, and inference, revealing findings overlooked in previous studies for practitioners to mitigate backdoor threats. For example, Code LLM developers can adopt higher batch sizes with fewer epochs appropriately. Users of code models can adjust inference parameters, such as using a higher temperature or a larger top-k, appropriately. Future backdoor defense can prioritize the inspection of rarer and longer tokens, since they are more effective if they are indeed triggers. Since these non-backdoor design factors can also greatly sway attack performance, future backdoor studies should fully report settings, control key factors, and systematically vary them across configurations. What's more, we find that the prevailing consensus-that attacks are ineffective at extremely low poisoning rates-is incorrect. The absolute number of poisoned samples matters as well. Specifically, poisoning just 20 out of 454,451 samples (0.004% poisoning rate-far below the minimum setting of 0.1% considered in prior Code LLM backdoor attack studies) successfully implants backdoors! Moreover, the common defense is incapable of removing even a single poisoned sample from this poisoned dataset, highlighting the urgent need for defense mechanisms against extremely low poisoning rate settings.