ICML2020

Correlation Clustering with Asymmetric Classification Errors

Jafar Jafarov, Sanchit Kalhan, Konstantin Makarychev, Yury Makarychev

15 citations

Abstract

In the Correlation Clustering problem, we are given a weighted graph GG with its edges labeled as"similar"or"dissimilar"by a binary classifier. The goal is to produce a clustering that minimizes the weight of"disagreements": the sum of the weights of"similar"edges across clusters and"dissimilar"edges within clusters. We study the correlation clustering problem under the following assumption: Every"similar"edge ee has weight we[αw,w]\mathbf{w}_e\in[\alpha \mathbf{w}, \mathbf{w}] and every"dissimilar"edge ee has weight weαw\mathbf{w}_e\geq \alpha \mathbf{w} (where α1\alpha\leq 1 and w>0\mathbf{w}>0 is a scaling parameter). We give a (3+2loge(1/α))(3 + 2 \log_e (1/\alpha)) approximation algorithm for this problem. This assumption captures well the scenario when classification errors are asymmetric. Additionally, we show an asymptotically matching Linear Programming integrality gap of Ω(log1/α)\Omega(\log 1/\alpha).