NeurIPS2024

Invariant subspaces and PCA in nearly matrix multiplication time

Aleksandros Sobczyk, Marko Mladenovic, Mathieu Luisier

摘要

Approximating invariant subspaces of generalized eigenvalue problems (GEPs) is a fundamental computational problem at the core of machine learning and scientific computing. It is, for example, the root of Principal Component Analysis (PCA) for dimensionality reduction, data visualization, and noise filtering, and of Density Functional Theory (DFT), arguably the most popular method to calculate the electronic structure of materials. Given Hermitian H,SCn×nH,S\in\mathbb{C}^{n\times n}, where SS is positive-definite, let Πk\Pi_k be the true spectral projector on the invariant subspace that is associated with the kk smallest (or largest) eigenvalues of the GEP HC=SCΛHC=SC\Lambda, for some k[n]k\in[n]. We show that we can compute a matrix Π~k\widetilde\Pi_k such that ΠkΠ~k2ϵ\lVert\Pi_k-\widetilde\Pi_k\rVert_2\leq \epsilon, in O(nω+ηpolylog(n,ϵ1,κ(S),gapk1))O\left( n^{\omega+\eta}\mathrm{polylog}(n,\epsilon^{-1},\kappa(S),\mathrm{gap}_k^{-1}) \right) bit operations in the floating point model, for some ϵ(0,1)\epsilon\in(0,1), with probability 11/n1-1/n. Here, η>0\eta>0 is arbitrarily small, ω2.372\omega\lesssim 2.372 is the matrix multiplication exponent, κ(S)=S2S12\kappa(S)=\lVert S\rVert_2\lVert S^{-1}\rVert_2, and gapk\mathrm{gap}_k is the gap between eigenvalues kk and k+1k+1. To achieve such provable"forward-error"guarantees, our methods rely on a new O(nω+η)O(n^{\omega+\eta}) stability analysis for the Cholesky factorization, and a smoothed analysis for computing spectral gaps, which can be of independent interest. Ultimately, we obtain new matrix multiplication-type bit complexity upper bounds for PCA problems, including classical PCA and (randomized) low-rank approximation.