WWW2025

Not All Benignware Are Alike: Enhancing Clean-Label Attacks on Malware Classifiers

Xutong Wang, Yun Feng, Bingsheng Bi, Yaqin Cao, Ze Jin, Xinyu Liu, Yuling Liu, Yunpeng Li

摘要

Machine Learning (ML) based malware classifiers are vulnerable to exploitation during the training phase due to the necessity of regular retraining with samples collected from the wild. Recent studies have highlighted the efficacy of backdoor attacks in the malware domain, where attackers can manipulate the model during training by injecting samples embedded with specific triggers, causing the model to establish an association between the trigger and a designated class, thereby achieving evasion of detection. While research on backdoor attacks has been extensively explored in the field of computer vision, it has been largely overlooked in the malware domain. Unlike in the computer vision domain, the threat model in the malware domain typically restricts attackers to employing clean-label attacks (i.e., attackers do not have control over the labeling of poisoned data). However, clean-label attack methods are generally less effective compared to those that involve embedding triggers and altering sample labels to the target class (called corrupted-label attacks). To address this limitation, we propose a simple yet effective method that involves Poisoning Malware-Similar Benignware (PMSB) instead of random selection, thereby approximating the scenario of corrupted-label attacks and enhancing the effectiveness of clean-label attacks. Additionally, we introduce three similarity measurement methods based on feature-based distance, distribution-based distance, and contribution-based difference to select malware-similar benignware. Comprehensive evaluations across three different trigger types and three datasets demonstrate the superiority and general applicability of PMSB.