ICLR2026
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
Chengyuan Deng, Jie Gao, Songhua He, Chen Wang
摘要
Correlation clustering is an important unsupervised learning problem with broad applications. In this problem, we are given a labeled complete graph , and the optimal clustering is defined as a partition of the vertices that minimizes the edges between clusters and edges within clusters. We investigate efficient algorithms to test the cost of correlation clustering: here, we want to know whether the graph could be (nearly) perfectly clustered (with cost) or is far away from admitting any perfect clustering. The problem has attracted significant attention aimed at modern large-scale applications, and the state-of-the-art results use queries and time (up to log factors) to decide whether a graph is perfectly clusterable or needs to flip labels of edges to become clusterable. In this paper, we improve this bound significantly by designing an algorithm that uses queries and time. Furthermore, we derive the first algorithm that tests the cost for the special setting of correlation clustering with clusters with queries and time for constant . Finally, for the special case of , which corresponds to the strong structure balance problem in social networks, we obtain tight bounds of queries -- the first set of tight bounds in these problems. We conduct experiments on simulated and real-world datasets, and empirical results demonstrate the advantages of our algorithms.