NeurIPS2020

HYDRA: Pruning Adversarially Robust Neural Networks

Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana

被引用 233 次

摘要

In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). While the research community has extensively explored the use of robust training and network pruning independently to address one of these challenges, only a few recent works have studied them jointly. However, these works inherit a heuristic pruning strategy that was developed for benign training, which performs poorly when integrated with robust training techniques, including adversarial training and verifiable robust training. To overcome this challenge, we propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. We realize this insight by formulating the pruning objective as an empirical risk minimization problem which is solved efficiently using SGD. We demonstrate that our approach, titled HYDRA 1 , achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously. We demonstrate the success of our approach across CIFAR-10, SVHN, and ImageNet dataset with four robust training techniques: iterative adversarial training, randomized smoothing, MixTrain, and CROWN-IBP. We also demonstrate the existence of highly robust sub-networks within non-robust networks. Our code and compressed networks are publicly available 2 . Key contributions: We make the following key contributions. • We develop a novel pruning technique, which is aware of the robust training objective, by formulating it as an empirical risk minimization problem, which we solve efficiently with SGD. We show the generalizability of our formulation by considering multiple types of robust training objectives, including verifiable robustness. We employ an importance score based optimization technique with our proposed scaled initialization of importance scores, which is the key driver behind the success of our approach. • We evaluate the proposed approach across four robust training objectives, namely iterative adversarial training [7, 30, 49], randomized smoothing [8, 7], MixTrain [38], and CROWN-IBP [48] on CIFAR-10, SVHN, and ImageNet dataset with multiple network architectures. Notably, at 99% connection pruning ratio, we achieve gains up to 3.2, 11.2, and 17.8 percentage points in robust accuracy, while simultaneously achieving state-of-the-art benign accuracy, compared to previous works [34, 45, 15] for ImageNet, CIFAR-10, and SVHN dataset, respectively. • We also demonstrate the existence of highly robust sub-networks within non-robust or weakly robust networks. In particular, within empirically robust networks that have no verifiable robustness, we were able to find sub-networks with verified robust accuracy close to state-of-the-art. 2 Background and related work Robust training. Robust training is one of the primary defenses against adversarial examples [5, 13, 6, 30, 3] where it can be divided into two categories: Adversarial training and verifiable robust training. The key objective of adversarial training is to minimize the training loss on adversarial