ICLR2026

Stable coresets: Unleashing the power of uniform sampling

Amir Carmel, Robert Krauthgamer

摘要

Uniform sampling is a highly efficient method for data summarization. However, its effectiveness in producing coresets for clustering problems is not yet well understood, primarily because it generally does not yield a strong coreset, which is the prevailing notion in the literature. We formulate stable coresets, a notion that is intermediate between the standard notions of weak and strong coresets, and effectively combines the broad applicability of strong coresets with highly efficient constructions, through uniform sampling, of weak coresets. Our main result is that a uniform sample of size O(ϵ2logd)O(\epsilon^{-2}\log d) yields, with high constant probability, a stable coreset for 11-median in Rd\mathbb{R}^d under the 1\ell_1 metric. We then leverage the powerful properties of stable coresets to easily derive new coreset constructions, all through uniform sampling, for 1\ell_1 and related metrics, such as Kendall-tau and Jaccard. We also show applications to fair rank aggregation and to approximation algorithms for kk-median problem in these metric spaces. Our experiments validate the benefits of stable coresets in practice, in terms of both construction time and approximation quality.