STOC2022
Clustering mixtures with almost optimal separation in polynomial time
Allen Liu, Jerry Li
被引用 1 次
摘要
We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of k identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least Δ, for some parameter Δ > 0, and the goal is to recover the ground truth clustering of these samples. It is folklore that separation Δ = Θ (√logk) is both necessary and sufficient to recover a good clustering (say with constant or 1/poly(k) error), at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is Δ = Ω (log1/2 + c k), for any c > 0. Previously, polynomial time algorithms were only known for this problem when the separation was polynomial in k, and all algorithms which could tolerate poly logk separation required quasipolynomial time. We also extend our result to mixtures of translations of a distribution which satisfies the Poincaré inequality, under additional mild assumptions.