ICML2024
High-Dimensional Geometric Streaming for Nearly Low Rank Data
Hossein Esfandiari, Praneeth Kacham, Vahab Mirrokni, David P. Woodruff, Peilin Zhong
被引用 1 次
摘要
We study streaming algorithms for the subspace approximation problem. Given points as an insertion-only stream and a rank parameter , the subspace approximation problem is to find a -dimensional subspace such that is minimized, where denotes the Euclidean distance between and defined as . When , we need to find a subspace that minimizes . For subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For subspace approximation, we show that suitably scaling the points and then using our coreset construction, we can compute a approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.