ICLR2026

Closed-form r\ell_r norm scaling with data for overparameterized linear regression and diagonal linear networks under p\ell_p bias

Shuofeng Zhang, Ard A. Louis

被引用 1 次

摘要

For overparameterized linear regression with isotropic Gaussian design and minimum-p\ell_p interpolator p(1,2]p\in(1,2], we give a unified, high-probability characterization for the scaling of the family of parameter norms wp^rr[1,p]\\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} with sample size.

We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal spike and a bulk of null coordinates in XYX^\top Y, yielding closed-form predictions for (i) a data-dependent transition nn_\star (the "elbow"), and (ii) a universal threshold r=2(p1)r_\star=2(p-1) that separates wp^r\lVert \widehat{w_p} \rVert_r's which plateau from those that continue to grow with an explicit exponent.

This unified solution resolves the scaling of all r\ell_r norms within the family r[1,p]r\in [1,p] under p\ell_p-biased interpolation, and explains in one picture which norms saturate and which increase as nn grows.

We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale α\alpha to an effective peff(α)p_{\mathrm{eff}}(\alpha) via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias.

Given that many generalization proxies depend on wp^r\lVert \widehat {w_p} \rVert_r, our results suggest that their predictive power will depend sensitively on which lrl_r norm is used.