ICML2024
Coresets for Multiple ℓp Regression
David P. Woodruff, Taisuke Yasuda
3 citations
Abstract
The ℓ p regression problem takes as input a matrix A ∈ R n×d , a vector b ∈ R n , and a number p ∈ [1, ∞), and it returns as output a number Z and a vector In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n ≫ d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samples r1 = O(36 p d maxp/2+1,p+1 ) rows of A and the corresponding elements of b, and then it solves the ℓ p regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample r1 /ǫ 2 constraints, and then it solves the ℓ p regression problem on the new sample; we prove this is a (1 + ǫ)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓ p regression, namely p = 1, 2 [10, 13] . In course of proving our result, we develop two concepts-well-conditioned bases and subspace-preserving sampling-that are of independent interest.