STOC2025

Solving the Correlation Cluster LP in Sublinear Time

Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang

摘要

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the cluster LP in time O(npoly(1/ε))O(n^{\text{poly}(1/ε)}) with objective value at most (1+ε)(1+ε) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a (1.485+ε)(1.485+ε)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most (1+ε)(1+ε) of the optimum in time O~(2poly(1/ε)n)\widetilde O(2^{\text{poly}(1/ε)} n), where nn is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.485+ε)(1.485+ε)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.