ICML2021
Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering
Romain Couillet, Florent Chatelain, Nicolas Le Bihan
被引用 11 次
摘要
The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly"puncturing"both the data matrix (or ) and its corresponding kernel (Gram) matrix through Bernoulli masks: for and for . The resulting"two-way punctured"kernel is thus given by . We demonstrate that, for composed of independent columns drawn from a Gaussian mixture model, as with , the spectral behavior of -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.