WWW2024

DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification

Junfan Chen, Richong Zhang, Yaowei Zheng, Qianben Chen, Chunming Hu, Yongyi Mao

4 citations

Abstract

Text classification is a fundamental task in web content mining. Developing text classification applications with pre-trained language models (PLMs) and the contrastive learning objective has sparked significant interest in research communities. Although the existing supervised contrastive learning (SCL) approach has achieved leading performance in text classification, it lacks fundamental principles to ensure training effectiveness and deployment friendliness, thereby presenting certain limitations. In this paper, we propose three principles to design an effective SCL approach, i.e., parameter-free, augmentation-easy and label-aware. Building upon these principles, we have developed DualCL, a dual contrastive learning framework that effectively captures the mutual relationship between text representations and classifier parameters. The implementation of DualCL is theoretically motivated by a derived lower bound of mutual information maximization. DualCL generates classifier parameters by the PLM and simultaneously uses them for classification and as augmented views of the input text for supervised contrastive learning. Extensive experiments conducted on diverse text classification datasets conclusively demonstrate that DualCL excels in learning superior text representations and consistently outperforms baseline models, yielding remarkable results. CCS CONCEPTS • Computing methodologies → Natural language processing.