WWW2026

Unequal Vulnerability: The Differential Impact of Label Flipping Attacks Across Classes

Pinlong Zhao, Mengyang Li, Pengfei Jiao, Huijun Tang, Ou Wu

摘要

Label flipping attacks stand as a potent and practical threat to the integrity of machine learning models. While extensive research has focused on designing sophisticated attack and defense mechanisms, the underlying factors that govern a model's susceptibility remain underexplored. This paper reveals a critical phenomenon: the impact of label flipping attacks is highly differential across classes, strongly correlated with the intrinsic confusability between the source and target classes. We provide a rigorous theoretical analysis, demonstrating that a lower standardized separation between classes fundamentally leads to greater vulnerability. Grounded in this insight, we propose Confusability-Aware Contrastive Learning (CACL), a targeted defense that maximizes the feature-space separation for the most vulnerable class pairs. Extensive experiments validate the strong link between class separability and vulnerability, and show that CACL significantly mitigates the attack's impact while providing superior protection for the most susceptible classes. Our code is available at https://github.com/Pinlong-Zhao/Unequal-Vulnerability.