WWW2026
Expectation-Maximization Driven Contrastive Disentanglement for Generalized Category Discovery
Weiyi Yang, Richong Zhang, Junfan Chen, Jiawei Sheng, Lihong Wang
摘要
Generalized Category Discovery (GCD) is a critical task in open-world computing scenarios, aiming to automatically classify partially labeled data by recognizing both known and novel categories. However, existing GCD methods usually suffer from inherent bias toward known categories due to the exclusive pre-training on them and the absence of labeled data of novel categories. This bias can lead to significant misclassification and clustering errors for novel categories. Although recent approaches leverage pseudo-label training and contrastive learning to address this, they still lack explicit supervision to disentangle novel and known categories, resulting in performance bottlenecks. To address these limitations, we propose an Expectation-Maximization-driven Contrastive Disentanglement (EMCD) framework designed to explicitly disentangle novel and known categories. We particularly formulate the identification of novel categories as a latent variable estimation problem. Specifically, it incorporates an EM-disentangling regularization to softly identify novel category samples and a consistency regularization to enhance generalization. In addition, we leverage dual contrastive constraints, including a cluster-sample contrastive constraint and a sample-sample contrastive contrastive, to pull close samples of novel categories while pushing apart ambiguous samples near decision boundaries. Empirical results on 3 commonly used datasets demonstrate that our model is effective and outperforms previous state-of-the-art methods. Our code is available at https://github.com/YWY-only/EMCD.