ICLR2026

Distributed Algorithms for Euclidean Clustering

Vincent Cohen-Addad, Liudeng Wang, David Woodruff, Samson Zhou

摘要

We study the problem of constructing (1+ε)(1+\varepsilon)-coresets for Euclidean (k,z)(k,z)-clustering in the distributed setting, where nn data points are partitioned across ss sites. We focus on two prominent communication models: the coordinator model and the blackboard model. In the coordinator model, we design a protocol that achieves a (1+ε)(1+\varepsilon)-strong coreset with total communication complexity O~(sk+dkmin(ε4,ε2+z)+dklog(nΔ))\tilde{O}\left(sk + \frac{dk}{\min(\varepsilon^4,\varepsilon^{2+z})} + dk\log(n\Delta)\right) bits, improving upon prior work (Chen et al., NeurIPS 2016) by eliminating the need to communicate explicit point coordinates in-the-clear across all servers. In the blackboard model, we further reduce the communication complexity to O~(slog(nΔ)+dklog(nΔ)+dkmin(ε4,ε2+z))\tilde{O}\left(s\log(n\Delta) + dk\log(n\Delta) + \frac{dk}{\min(\varepsilon^4,\varepsilon^{2+z})}\right) bits, achieving better bounds than previous approaches while upgrading from constant-factor to (1+ε)(1+\varepsilon)-approximation guarantees. Our techniques combine new strategies for constant-factor approximation with efficient coreset constructions and compact encoding schemes, leading to optimal protocols that match both the communication costs of the best-known offline coreset constructions and existing lower bounds (Chen et al., NeurIPS 2016, Huang et. al., STOC 2024), up to polylogarithmic factors.