STOC2022
Towards optimal lower bounds for k-median and k-means coresets
Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn
被引用 20 次
摘要
The (k,z)-clustering problem consists of finding a set of k points called centers, such that the sum of distances raised to the power of z of every data point to its closest center is minimized. Among the most commonly encountered special cases are k-median problem (z=1) and k-means problem (z=2). The k-median and k-means problems are at the heart of modern data analysis and massive data applications have given raise to the notion of coreset: a small (weighted) subset of the input point set preserving the cost of any solution to the problem up to a multiplicative (1± ε) factor, hence reducing from large to small scale the input to the problem.