NeurIPS2023
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond
Taiji Suzuki, Denny Wu, Kazusato Oko, Atsushi Nitanda
15 citations
Abstract
Neural network in the mean-field regime is known to be capable of feature learning , unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural networks can be globally optimized by a noisy gradient descent update termed the mean-field Langevin dynamics (MFLD). However, all existing guarantees for MFLD only considered the optimization efficiency, and it is unclear if this algorithm leads to improved generalization performance and sample complexity due to the presence of feature learning. To fill this important gap, in this work we study the sample complexity of MFLD in learning a class of binary classification problems. Unlike existing margin bounds for neural networks, we avoid the typical norm control by utilizing the perspective that MFLD optimizes the distribution of parameters rather than the parameter itself; this leads to an improved analysis of the sample complexity and convergence rate. We apply our general framework to the learning of k -sparse parity functions, where we prove that unlike kernel methods, two-layer neural networks optimized by MFLD achieves a sample complexity where the degree k is “decoupled” from the exponent in the dimension dependence.