ICML2025

Relative Error Fair Clustering in the Weak-Strong Oracle Model

Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou

摘要

We study fair clustering in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on n input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness in the presence of inaccurate information. We present the first (1 + ε)-coresets for fair k-median clustering and use only poly k ε • log n queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints). We obtain (1 + ε)coresets for (k, z)-clustering for general z = O(1) with a similar number of strong oracle queries. In contrast, previous results in this model achieved a constant-factor (> 10) approximation for the standard k-clustering problems, and no previous work considered the fair k-median clustering problem.