NeurIPS2024
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan
Abstract
Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment. This paper in-depth analyzes the factors that may affect the valley symmetry of DNNs. Previous work's analysis of valley shape primarily utilizes the 1D interpolation of θ f + λϵ, where θ f represents the minima solution and ϵ denotes a random noise. As shown in Fig. 1 , we believe that the valley symmetry depends both on the convergence solution and noise, with each of them being influenced by some factors. The most significant innovation in our research is considering the effect of noise direction on valley visualization, as previous work has simply taken the Gaussian noise. 38th Conference on Neural Information Processing Systems (NeurIPS 2024).