NeurIPS2025

A Geometric Analysis of PCA

Ayoub El Hanchi, Murat A. Erdogdu, Chris J. Maddison

Abstract

What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than π/4. * where in the first line we used the identity (I - , and in the second we expanded T and performed block-wise matrix multiplication. * X)(U * X) T ].