ACL2025

Multimodal Invariant Sentiment Representation Learning

Aoqiang Zhu, Min Hu, Xiaohua Wang, Jiaoyun Yang, Yiming Tang, Ning An

4 citations

Abstract

Multimodal Sentiment Analysis (MSA) integrates diverse modalities to overcome the limitations of unimodal data. However, existing MSA datasets commonly have significant sentiment distribution imbalances and cross-modal sentiment conflicts, which hinder performance improvement. This paper shows that distributional discrepancies and sentiment conflicts can be incorporated into the model training to learn stable multimodal invariant sentiment representation. To this end, we propose a Multimodal Invariant Sentiment Representation Learning (MISR) method. Specifically, we first learn a stable and consistent multimodal joint representation in the latent space of Gaussian distribution based on distributional constraints. Then. under invariance constraint, we further learn multimodal invariant sentiment representations from multiple distributional environments constructed by the joint representation and unimodal data, achieving robust and efficient MSA performance. Extensive experiments demonstrate that MISR significantly enhances MSA performance and achieves new state-of-the-art. The code has been released at https://github.com/aoqzhu/MISR .