STOC2022

List-decodable covariance estimation

Misha Ivkov, Pravesh K. Kothari

5 citations

Abstract

We give the first polynomial time algorithm for list-decodable covariance estimation. For any > 0, our algorithm takes input a sample ⊆ ℝ of size poly(1/ ) obtained by adversarially corrupting an (1 -) points in an i.i.d. sample of size from the Gaussian distribution with unknown mean * and covariance Σ * . In poly(1/ ) time, it outputs a constant-size list of = ( ) = (1/ ) poly(1/ ) candidate parameters that, with high probability, contains a ( ˆ , Σ) such that the total variation distance . This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation with dimension independent error. Our algorithm works more generally for (1 -)-corruptions of any distribution that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials. Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery [KKK19, RY19, BK21, RY20b]. The best-known algorithms for both these problems only yield a weak recovery guarantee that needs super-polynomial time for any sub-constant (in dimension ) target error for the parameters in natural norms. As a corollary, our result yields the first polynomial time exact algorithm for list-decodable linear regression and subspace recovery that, in particular, obtain 2 -poly( ) error in polynomial-time in the underlying dimension. List-decodable setting also generalizes the problem of robust clustering non-spherical mixtures in the strong contamination model [BK20b, DHKK20] and the state of the art [BK20b] for this latter problem needs ( ) samples and tolerates an ≪ -( ) fraction outliers. Our result implies an algorithm with an improved running time and sample bound of poly( ) that handles a larger ≪ 1/poly( ) fraction of outliers.