ICML2024

High-Dimensional Geometric Streaming for Nearly Low Rank Data

Hossein Esfandiari, Praneeth Kacham, Vahab Mirrokni, David P. Woodruff, Peilin Zhong

1 citation

Abstract

We study streaming algorithms for the p\ell_p subspace approximation problem. Given points a1,,ana_1, \ldots, a_n as an insertion-only stream and a rank parameter kk, the p\ell_p subspace approximation problem is to find a kk-dimensional subspace VV such that (i=1nd(ai,V)p)1/p(\sum_{i=1}^n d(a_i, V)^p)^{1/p} is minimized, where d(a,V)d(a, V) denotes the Euclidean distance between aa and VV defined as minvVav\min_{v \in V}\|{a - v}\|_{\infty}. When p=p = \infty, we need to find a subspace VV that minimizes maxid(ai,V)\max_i d(a_i, V). For \ell_{\infty} subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a poly(k,logn)\text{poly}(k, \log n) approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For p\ell_p subspace approximation, we show that suitably scaling the points and then using our \ell_{\infty} coreset construction, we can compute a poly(k,logn)\text{poly}(k, \log n) approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.