NeurIPS2024

Approximating the Top Eigenvector in Random Order Streams

Praneeth Kacham, David P. Woodruff

Abstract

When rows of an n×dn \times d matrix AA are given in a stream, we study algorithms for approximating the top eigenvector of the matrix ATA{A}^TA (equivalently, the top right singular vector of AA). We consider worst case inputs AA but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter R=σ1(A)2/σ2(A)2=Ω(1)R = \sigma_1(A)^2/\sigma_2(A)^2 = \Omega(1), then there is a randomized algorithm that uses O(hdpolylog(d))O(h \cdot d \cdot \operatorname{polylog}(d)) bits of space and outputs a unit vector vv that has a correlation 1O(1/R)1 - O(1/\sqrt{R}) with the top eigenvector v1v_1. Here hh denotes the number of heavy rows in the matrix, defined as the rows with Euclidean norm at least AF/dpolylog(d)\|{A}\|_F/\sqrt{d \cdot \operatorname{polylog}(d)}. We also provide a lower bound showing that any algorithm using O(hd/R)O(hd/R) bits of space can obtain at most 1Ω(1/R2)1 - \Omega(1/R^2) correlation with the top eigenvector. Thus, parameterizing the space complexity in terms of the number of heavy rows is necessary for high accuracy solutions. Our results improve upon the R=Ω(lognlogd)R = \Omega(\log n \cdot \log d) requirement in a recent work of Price and Xun (FOCS 2024). We note that the algorithm of Price and Xun works for arbitrary order streams whereas our algorithm requires a stronger assumption that the rows are presented in a uniformly random order. We additionally show that the gap requirements in their analysis can be brought down to R=Ω(log2d)R = \Omega(\log^2 d) for arbitrary order streams and R=Ω(logd)R = \Omega(\log d) for random order streams. The requirement of R=Ω(logd)R = \Omega(\log d) for random order streams is nearly tight for their analysis as we obtain a simple instance with R=Ω(logd/loglogd)R = \Omega(\log d/\log\log d) for which their algorithm, with any fixed learning rate, cannot output a vector approximating the top eigenvector v1v_1.