ICML2020

The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture

Francesca Mignacco, Florent Krzakala, Yue M. Lu, Pierfrancesco Urbani, Lenka Zdeborová

被引用 98 次

摘要

We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number nn of samples and their dimension dd go to infinity while their ratio is fixed to α=n/d\alpha= n/d. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.