S&P2023

AI-Guardian: Defeating Adversarial Attacks using Backdoors

Hong Zhu, Shengzhi Zhang, Kai Chen

Abstract

Deep neural networks (DNNs) have been widely used in many fields due to their increasingly high accuracy. However, they are also vulnerable to adversarial attacks, posing a serious threat to security-critical applications such as autonomous driving, remote diagnosis, etc. Existing solutions are limited in detecting/preventing such attacks, and also impacting the performance on the original tasks. In this paper, we present AI-Guardian, a novel approach to defeating adversarial attacks that leverages intentionally embedded backdoors to fail the adversarial perturbations and maintain the performance of the original main task. We extensively evaluate AI-Guardian using five popular adversarial example generation approaches, and experimental results demonstrate its efficacy in defeating adversarial attacks. Specifically, AI-Guardian reduces the attack success rate from 97.3% to 3.2%, which outperforms the state-of-the-art works by 30.9%, with only a 0.9% decline on the clean data accuracy. Furthermore, AI-Guardian introduces only 0.36% overhead to the model prediction time, almost negligible in most cases.