ICML2025

Batch List-Decodable Linear Regression via Higher Moments

Ilias Diakonikolas, Daniel Kane, Sushrut Karmalkar, Sihan Liu, Thanasis Pittas

摘要

We study the task of list-decodable linear regression using batches. A batch is called clean if the points it contains are i.i.d. samples from an unknown linear regression distribution. For a parameter α ∈ (0, 1/2), an unknown α-fraction of the batches are clean and no assumptions are made on the remaining batches. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in ℓ 2 -norm. [DJKS23] gave an efficient algorithm for this task, under natural distributional assumptions, with the following guarantee. Under the assumption that the batch size n satisfies n ≥ Ω(α -1 ) and the number of batches is m = poly(d, n, 1/α), their algorithm runs in polynomial time and outputs a list of O(1/α 2 ) vectors at least one of which is Õ(α -1/2 / √ n) close to the target regressor. Here we design a new polynomial-time algorithm for this task with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant δ > 0, as long as the batch size is n ≥ Ω δ (α -δ ) and the degree-Θ(1/δ) moments of the covariates are SoS certifiably bounded, our algorithm uses m = poly((dn) 1/δ , 1/α) batches, runs in polynomial-time, and outputs an O(1/α)-sized list of vectors one of which is O(α -δ/2 / √ n) close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach leverages higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure for this setting. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.