ICML2023

The Power of Uniform Sampling for k-Median

Lingxiao Huang, Shaofeng H.-C. Jiang, Jianing Lou

7 citations

Abstract

We study the power of uniform sampling for kk-Median in various metric spaces. We relate the query complexity for approximating kk-Median, to a key parameter of the dataset, called the balancedness β(0,1]\beta \in (0, 1] (with 11 being perfectly balanced). We show that any algorithm must make Ω(1/β)\Omega(1 / \beta) queries to the point set in order to achieve O(1)O(1)-approximation for kk-Median. This particularly implies existing constructions of coresets, a popular data reduction technique, cannot be query-efficient. On the other hand, we show a simple uniform sample of poly(kϵ1β1)\mathrm{poly}(k \epsilon^{-1} \beta^{-1}) points suffices for (1+ϵ)(1 + \epsilon)-approximation for kk-Median for various metric spaces, which nearly matches the lower bound. We conduct experiments to verify that in many real datasets, the balancedness parameter is usually well bounded, and that the uniform sampling performs consistently well even for the case with moderately large balancedness, which justifies that uniform sampling is indeed a viable approach for solving kk-Median.