NeurIPS2023

Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness

Ambar Pal, Jeremias Sulam, René Vidal

被引用 11 次

摘要

The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -concentration on small-volume subsets of the input space -determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, utilizing structure in data naturally leads to classifiers that enjoy data-dependent polyhedral robustness guarantees, improving upon methods for provable certification in certain regimes. Introduction, Motivation and Contributions Research in adversarial learning has shown that traditional neural network based classification models are prone to anomalous behaviour when their inputs are modified by tiny, human-imperceptible perturbations. Such perturbations, called adversarial examples, lead to a large degradation in the accuracy of classifiers [55] . This behavior is problematic when such classification models are deployed in security sensitive applications. Accordingly, researchers have and continue to come up with defenses against such adversarial attacks for neural networks. Such defenses [49, 60, 42, 22] modify the training algorithm, alter the network weights, or employ preprocessing to obtain classifiers that have improved empirical performance against adversarially corrupted inputs. However, many of these defenses have been later broken by new adaptive attacks [1, 8] . This motivated recent impossibility results for adversarial defenses, which aim to show that all defenses admit adversarial examples. While initially such results were shown for specially parameterized data distributions [18] , they were subsequently expanded to cover general data distributions on the unit sphere and the unit cube [48] , as well as for distributions over more general manifolds [12] . On the other hand, we humans are an example of a classifier capable of very good (albeit imperfect [17] ) robust accuracy against ℓ 2 -bounded attacks for natural image classification. Even more, a large body of recent work has constructed certified defenses [11, 63, 10, 29, 19, 54] which obtain non-trivial performance guarantees under adversarially perturbed inputs for common datasets like MNIST, CIFAR-10 and ImageNet. This apparent contention between impossibility results and the existence of robust classifiers for natural datasets indicates that the bigger picture is more nuanced, and motivates a closer look at the impossibility results for adversarial examples. Our first contribution is to show that these results can be circumvented by data distributions whose mass concentrates on small regions of the input space. This naturally leads to the question of whether such a construction is necessary for adversarial robustness. We answer this question in the affirmative, formally proving that a successful defense exists only when the data distribution concentrates on an 37th Conference on Neural Information Processing Systems (NeurIPS 2023).