CVPR2025

Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition

Wuyou Xia, Guoli Jia, Sicheng Zhao, Jufeng Yang

摘要

Multimodal sentiment analysis has attracted extensive research attention as increasing numbers of users share images and texts to express their emotions and opinions on social media. Collecting large amounts of labeled sentiment data is an expensive and challenging task due to the high cost of labeling and unavoidable label ambiguity. Semisupervised learning (SSL) is explored to utilize the extensive unlabeled data to alleviate the demand for annotation. However, unlike typical multimodal tasks, sentiment inconsistency between image and text degrades the performance of SSL algorithms. To address the issue, we propose SCRD, the first semi-supervised framework for imagetext sentiment recognition. To better utilize the discriminative features of each modality, we decouple features into common and private parts. We then use the private features to train unimodal classifiers for enhanced modalityspecific sentiment representation. Considering the complex relationships between modalities, we devise a modal selection-based attention module that adaptively identifies the dominant sentiment modality at the sample level to guide multimodal fusion. Furthermore, to prevent model predictions from over-relying on common features under the guidance of multimodal labels, we design a pseudo-label filtering strategy based on the matching degree of prediction and dominant modality. Extensive experiments and comparisons on five publicly available datasets demonstrate that SCRD outperforms state-of-the-art methods. Our code is released on https://github.com/wuyou-xia/Seek-Common-Ground-While-Reserving-Differences .