NeurIPS2023

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu

被引用 41 次

摘要

We consider the problem of learning a single-index target function f * : R d → R under the spiked covariance data: where the link function σ * : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ * ), and it depends on the projection of input x onto the spike (signal) direction µ ∈ R d . In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0, ∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f * ? We show that for kernel ridge regression, β ≥ 1 -1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 -1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively.