NeurIPS2023

Asymmetric Certified Robustness via Feature-Convex Neural Networks

Samuel Pfrommer, Brendon G. Anderson, Julien Piet, Somayeh Sojoudi

被引用 9 次

摘要

Real-world adversarial attacks on machine learning models often feature an asymmetric structure wherein adversaries only attempt to induce false negatives (e.g., classify a spam email as not spam). We formalize the asymmetric robustness certification problem and correspondingly present the feature-convex neural network architecture, which composes an input-convex neural network (ICNN) with a Lipschitz continuous feature map in order to achieve asymmetric adversarial robustness. We consider the aforementioned binary setting with one "sensitive" class, and for this class we prove deterministic, closed-form, and easily-computable certified robust radii for arbitrary ℓ p -norms. We theoretically justify the use of these models by extending the universal approximation theorem for ICNN regression to the classification setting, and proving a lower bound on the probability that such models perfectly fit even unstructured uniformly distributed data in sufficiently high dimensions. Experiments on Malimg malware classification and subsets of the MNIST, Fashion-MNIST, and CIFAR-10 datasets show that feature-convex classifiers attain substantial certified ℓ 1 , ℓ 2 , and ℓ ∞ -radii while being far more computationally efficient than competitive baselines. 1 * Equal contribution. 1 Code for reproducing our results is available on GitHub. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).