WWW2026

ATGFB-MFF: Adaptive Text-Guided Fiber Bundle Feature Fusion with LLMs for Multimodal Sentiment Analysis and Emotion Recognition in Conversations

Zhaowei Liu, Sheng Liu, Weiqing Yan, Peng Song, Yongchao Song, Rufei Gao

1 citation

Abstract

Multimodal Sentiment Analysis (MSA) and Emotion Recognition in Conversations (ERC) have rapidly developed into pivotal tasks in artificial intelligence. Large Language Models (LLMs) offer powerful semantic reasoning and computational capabilities, showing great potential for understanding emotional content. However, when applied to multimodal sentiment data, LLMs face significant challenges, including the inability to directly process heterogeneous data, difficulties in coping with feature misalignment and suboptimal cross-modal fusion. To address these challenges, we propose a novel multimodal sentiment inference framework named ATGFB-MFF which grounded in fiber bundle theory. This method decomposes multimodal features into an adaptive text-guided shared semantic space and fiber offset spaces to achieve structured alignment and fusion. Then the fused features are converted into structured pseudo-token sequences for effective inference via frozen LLMs. We also introduce two loss functions respectively called shared space consistency loss and fiber offset regularization loss which are used to improve representation stability. Extensive experiments on four benchmark datasets demonstrate that ATGFB-MFF consistently outperforms state-of-the-art baselines. These results highlight the efficacy of geometric structural modeling in unlocking the potential of LLMs for multimodal sentiment inference.