ICLR2026
Distributed Algorithms for Euclidean Clustering
Vincent Cohen-Addad, Liudeng Wang, David Woodruff, Samson Zhou
Abstract
We study the problem of constructing -coresets for Euclidean -clustering in the distributed setting, where data points are partitioned across sites. We focus on two prominent communication models: the coordinator model and the blackboard model. In the coordinator model, we design a protocol that achieves a -strong coreset with total communication complexity bits, improving upon prior work (Chen et al., NeurIPS 2016) by eliminating the need to communicate explicit point coordinates in-the-clear across all servers. In the blackboard model, we further reduce the communication complexity to bits, achieving better bounds than previous approaches while upgrading from constant-factor to -approximation guarantees. Our techniques combine new strategies for constant-factor approximation with efficient coreset constructions and compact encoding schemes, leading to optimal protocols that match both the communication costs of the best-known offline coreset constructions and existing lower bounds (Chen et al., NeurIPS 2016, Huang et. al., STOC 2024), up to polylogarithmic factors.