NeurIPS2023
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Dayal Singh Kalra, Maissam Barkeshli
被引用 19 次
摘要
We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate , depth , and width of the neural network. By analyzing the maximum eigenvalue of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time edge of stability"regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a sharpness reduction"phase, where sharpness decreases at early times, as and are increased.