NeurIPS2020

Certified Defense to Image Transformations via Randomized Smoothing

Marc Fischer, Maximilian Baader, Martin T. Vechev

被引用 76 次

摘要

We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with p norms). We address this challenge by introducing three different kinds of defenses, each with a different guarantee (heuristic, distributional and individual) stemming from the method used to bound the interpolation error. Importantly, we show how individual certificates can be obtained via either statistical error bounds or efficient online inverse computation of the image transformation. We provide an implementation of all methods at https://github.com/eth-sri/transformation-smoothing . Introduction Deep neural networks are vulnerable to adversarial examples [1] -small changes that preserve semantics (e.g., p -noise or geometric transformations such as rotations) [2], but can affect the output of a network in undesirable ways. As a result, there has been substantial recent interest in methods which aim to ensure the network is certifiably robust to adversarial examples [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] . Certification guarantees There are two principal robustness guarantees a certified defense can provide at inference time: (i) the (standard) distributional guarantee, where a robustness score is computed offline on the test set to be interpreted in expectation for images drawn from the data distribution, and (ii) an individual guarantee, where a certificate is computed online for the (possibly perturbed) input. The choice of guarantee depends on the application and regulatory constraints. Guarantees with p norms When considering p norms, existing certification methods can be directly used to obtain either of the above two guarantees: for an image x and adversarial noise δ, δ p < r, proving that a classifier f is r-robust around x := x + δ is enough to guarantee f (x) = f (x ). That is, it suffices to prove robustness of a perturbed input in order to certify that the perturbation did not change the classification, as the r-ball around x includes x. Key challenge: guarantees for geometric perturbations Perhaps not intuitively, however, for more complex perturbations such as geometric transformations, proving robustness around an image x via existing methods (e.g., [9] [10] [11] [12] ) does not imply that f (x) = f (x ) for the original image x. To illustrate this issue, consider the rotation R γ , by angle γ of an image x, followed by an interpolation I. Certifying that the classification of the rotated image x := I • R γ (x) for γ < r is robust under further rotations I • R β for β < r is not sufficient to imply that x and x classify the same, as rotating x back by β = -γ does not return the original image x due to interpolation. A central challenge then is to develop techniques that are able to handle more involved perturbations. 34th Conference on Neural Information Processing Systems (NeurIPS 2020),