ICML2022

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramèr

被引用 82 次

摘要

Making classiﬁers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classiﬁcation of adversarial examples: given a robust detector for attacks at distance (cid:15) (in some met-ric), we show how to build a similarly robust (but computationally inefﬁcient) classiﬁer for attacks at distance (cid:15)/ 2 . Our reduction is computationally inefﬁcient , but preserves the sample complexity of the original detector. The reduction thus cannot be directly used to build practical classiﬁers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efﬁcient classiﬁer ). To illustrate, we revisit 14 empirical detector defenses published over the past years. For 12 / 14 defenses, we show that the claimed detection results imply an inefﬁcient classiﬁer with robustness far beyond the state-of-the-art.