NeurIPS2022
Coreset for Line-Sets Clustering
Sagi Lotan, Ernesto Evgeniy Sanches Shayda, Dan Feldman
3 citations
Abstract
The input to the line-sets k -median problem is an integer k ≥ 1 , and a set L = L 1 , . . . , L n that contains n sets of lines in R d . The goal is to compute a set C of k centers (points in R d ) that minimizes the sum (cid:80) L ∈L min ℓ ∈ L,c ∈ C dist( ℓ, c ) of Euclidean distances from each set to its closest center, where dist( ℓ, c ) := min x ∈ ℓ ∥ x − c ∥ 2 . An ε -coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε -coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O (log 2 n ) weighted line-sets from L , and is constructed in O ( n log n ) time for every fixed d, k ≥ 1 and ε ∈ (0 , 1) . The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.