NeurIPS2022

Enhance the Visual Representation via Discrete Adversarial Training

Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue

44 citations

Abstract

Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust . A possible way towards robust machine perception can be Adversarial Training (AT) [5] , which automatically finds failure input cases of DNNs and augment online with these cases for fixing "bugs". With online augmentation of adversarial examples, AT greatly enhances the adversarial robustness, and helps for learning perceptually-aligned representations [6] with good interpretability [7, 8] and transferability [9] . However, AT is double-edged, which meanwhile degrades the standard performance caused by problematic regularization [10] . Such problematic regularization makes the decision boundaries over-smoothed and enlarges indecisive regions. Surprisingly, previous works [11, 12] observe a strange phenomenon that AT behaves conversely in Natural Language Processing (NLP) tasks. By automatically finding adversarial textual inputs, AT will not hurt the accuracy and even benefit for both generalization and robustness of language models. This phenomenon motivates us considering whether the merit of NLP-style AT can be Preprint. Under review.