ICLR2023
Fundamental limits on the robustness of image classifiers
Zheng Dai, David Gifford
摘要
Many recent works have shown that adversarial examples which fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b) , show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions in practice such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset that is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to ∞ perturbations of several image classification benchmarks. * Equal contribution. The same work is also presented in the ICLR 2019 Safe Machine Learning workshop.