ICCV2023

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Kaijie Zhu, Xixu Hu, Jindong Wang, Xing Xie, Ge Yang

38 citations

Abstract

Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at the cost of decreased generalization ability. This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. The core idea of RiFT is to exploit the redundant capacity for robustness by finetuning the adversarially trained model on its non-robustcritical module. To do so, we introduce module robust criticality (MRC), a measure that evaluates the significance of a given module to model robustness under worstcase weight perturbations. Using this measure, we identify the module with the lowest MRC value as the non-robustcritical module and fine-tune its weights to obtain fine-tuned weights. Subsequently, we linearly interpolate between the adversarially trained weights and fine-tuned weights to derive the optimal fine-tuned model weights. We demonstrate the efficacy of RiFT on ResNet18, ResNet34, and WideResNet34-10 models trained on CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Our experiments show that RiFT can significantly improve both generalization and outof-distribution robustness by around 1.5% while maintaining or even slightly enhancing adversarial robustness. Code is available at https://github.com/microsoft/ robustlearn .