NeurIPS2023

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Arthur Jacot

19 citations

Abstract

Previous work has shown that DNNs with large depth LL and L2L_{2}-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank R(0)(f)R^{(0)}(f) of the learned function ff, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure R(1)R^{(1)} of regularity which bounds the pseudo-determinant of the Jacobian Jf(x)+\left|Jf(x)\right|_{+} and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the `right' inner dimension. Finally, we prove the conjectured bottleneck structure in the learned features as LL\to\infty: for large depths, almost all hidden representations are approximately R(0)(f)R^{(0)}(f)-dimensional, and almost all weight matrices WW_{\ell} have R(0)(f)R^{(0)}(f) singular values close to 1 while the others are O(L12)O(L^{-\frac{1}{2}}). Interestingly, the use of large learning rates is required to guarantee an order O(L)O(L) NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.