ICML2023

On Coresets for Clustering in Small Dimensional Euclidean spaces

Lingxiao Huang, Ruiyuan Huang, Zengfeng Huang, Xuan Wu

被引用 7 次

摘要

We consider the problem of constructing small coresets for kk-Median in Euclidean spaces. Given a large set of data points PRdP\subset \mathbb{R}^d, a coreset is a much smaller set SRdS\subset \mathbb{R}^d, so that the kk-Median costs of any kk centers w.r.t. PP and SS are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small dd is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for kk-Median in small dimensions. For small dd, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small dd. In particular, we completely settle the coreset size bound for 11-d kk-Median (up to log factors). Interestingly, our results imply a strong separation between 11-d 11-Median and 11-d 22-Median. As far as we know, this is the first such separation between k=1k=1 and k=2k=2 in any dimension.