ICLR2026

Metric kk-clustering using only Weak Comparison Oracles

Rahul Raychaudhury, Aryan Esmailpour, sainyam galhotra, Stavros Sintos

摘要

Clustering is a fundamental primitive in unsupervised learning. However, classical algorithms for kk-clustering (such as kk-median and kk-means) assume access to exact pairwise distances, which is an unrealistic requirement in many modern applications. We study clustering in the Rank-model (R-model), where access to distances is entirely replaced by a quadruplet oracle that provides only relative distance comparisons. In practice, such an oracle can represent learned models or human feedback, and is expected to be noisy and entail an access cost.

Given a metric space with nn input items, we design randomized algorithms that, using only a noisy quadruplet oracle, compute a set of O(kpolylog(n))O(k \cdot \mathsf{polylog}(n)) centers along with a mapping from the input items to the centers such that the clustering cost of the mapping is at most constant times the optimum kk-clustering cost. Our method achieves a query complexity of O(nkpolylog(n))O(n\cdot k \cdot \mathsf{polylog}(n)) for arbitrary metric spaces and improves to O((n+k2)polylog(n))O((n+k^2) \cdot \mathsf{polylog}(n)) when the underlying metric has bounded doubling dimension. When the metric has bounded doubling dimension we can further improve the approximation from constant to 1+ε1+\varepsilon, for any arbitrarily small constant ε(0,1)\varepsilon\in(0,1), while preserving the same asymptotic query complexity. Our framework demonstrates how noisy, low-cost oracles, such as those derived from large language models, can be systematically integrated into scalable clustering algorithms.