CCS2023

Attack Some while Protecting Others: Selective Attack Strategies for Attacking and Protecting Multiple Concepts

Vibha Belavadi, Yan Zhou, Murat Kantarcioglu, Bhavani Thuraisingham

Abstract

Machine learning models are vulnerable to adversarial attacks. Existing research focuses on attack-only scenarios. In practice, one dataset may be used for learning different concepts, and the attacker may be incentivized to attack some concepts but protect the others. For example, the attacker might tamper a profile image for the "age'' model to predict "young'', while the "attractiveness'' model still predicts "pretty''. In this work, we empirically demonstrate that attacking the classifier for one learning task may negatively impact classifiers learning other tasks on the same data. This raises an interesting research question: is it possible to attack one set of classifiers while protecting the others trained on the same data?