NeurIPS2023

Adversarial Training from Mean Field Perspective

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

Abstract

Although adversarial training is known to be effective against adversarial examples, training dynamics are not well understood. In this study, we present the first theoretical analysis of adversarial training in random deep neural networks without any assumptions on data distributions. We introduce a new theoretical framework based on mean field theory, which addresses the limitations of existing mean field-based approaches. Based on this framework, we derive (empirically tight) upper bounds of ℓ q norm-based adversarial loss with ℓ p norm-based adversarial examples for various values of p and q. Moreover, we prove that networks without shortcuts are generally not adversarially trainable and that adversarial training reduces network capacity. We also show that network width alleviates these issues. Furthermore, we present the various impacts of the input and output dimensions on the upper bounds and time evolution of the weight variance. shortcuts [100] . Moreover, the theory has been applied to analyze network representation power [49, 71, 101] . However, existing mean field-based analysis cannot handle the properties of an entire network and input-parameter dependence, which is a drawback for some deep learning methods, e.g., adversarial training. Thus, we propose a new framework to address these limitations. Adversarial training. Various questions related to adversarial training have been theoretically addressed by several studies, including the robustness-accuracy trade-off [29, 47, 75, 76, 92, 113] , generalization gap [6, 50, 102, 107] , sample complexity [1, 14, 62, 80, 110] , large model requirement [63] , and enhanced transfer learning performance [27] . However, these results are obtained in limited settings (e.g., Gaussian data and linear classifiers) and cannot be easily extended to deep neural networks or realistic data distributions. To explore more general settings, recent studies have used neural tangent kernel theory [4, 46, 55] . In the kernel regime, adversarial training, even with a heuristic attack, finds a robust network [35, 115] . In our study, we investigate adversarial training dynamics from a mean field perspective, covering general multilayered networks with or without shortcuts and without assumptions about data distributions. Preliminaries Setting Notation is summarized in Tab. A2. For an integer n ∈ N, let [n] := 1, . . . , n. In this study, we focus on random deep neural networks with ReLU-like activations, called random ReLU-like networks. This is formally defined as follows: Definition 3.1 (ReLU-like network). A network is called a ReLU-like network if all its activation functions are ϕ(z) := uz for z ≥ 0 and vz for z < 0, with u, v ∈ R. ReLU-like activations [33, 57] are widely used in theoretical and practical applications [42, 45, 53, 85, 109] . In Appx. K, we extend our theorems to networks with Lipschitz continuous activations. A ReLU-like network, f : R d → R K , comprises L ∈ N trainable layers and two non-trainable layers for adjusting input and output dimensions. The input layer projects x in ∈ R d to an N -dimensional vector x (0) ∈ R N using the random matrix P in ∈ R N ×d . Subsequently, L consecutive affine transformations and activations are applied by g : R N → R N . Then, g(x (0) ) is multiplied by a random matrix P out ∈ R K×N to obtain the output vector f (x in ). Finally, the network function is f (x in ) := P out g(P in x in ). We assume that d and K are sufficiently large, and each entry of P in and P out is i.i.d. and sampled from the Gaussians N (0, 1/d) and N (0, 1/N ), respectively. An L-layer neural network g comprises the weights W (l) = (W