NeurIPS2021

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Bogdan-Adrian Manghiuc, He Sun

9 citations

Abstract

Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta [Das16], and present two polynomial-time approximation algorithms: Our first result is an O(1)-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature (e.g., [CAKMTM19, CC17]). Our second and main result is an O(1)-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous stateof-the-art [CAKMT17], which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure [CAKMT17].