WWW2026

MePe: Rethinking Multimodal Chinese Idiom Reading Comprehension from a Metaphorical Perspective

Tongguan Wang, Junkai Li, Feiyue Xue, Hui Liu, Dongyu Su, Wangjun Huang, Ying Sha

Abstract

The multimodal Chinese idiom reading comprehension task aims to select the most appropriate idiom from a candidate list via the given text and image. This poses a significant challenge for the model to comprehend each Chinese idiom accurately. Existing multimodal Chinese idiom reading comprehension methods primarily focus on aligning contextual text and images, while overlooking two key attributes of Chinese idioms.(1) There is a discrepancy between the literal and metaphorical meanings of Chinese idioms. (2) The same Chinese idiom has different meanings in different scenarios, which requires targeted understanding by experts who specialize in different fields. To address the above challenges, we rethink the solution to the multimodal idiom reading comprehension task from a metaphorical perspective and propose a framework named MePe. Firstly, we propose a literal metaphorical semantic graph that systematically transforms the implicit discrepancy between the literal and metaphorical meanings of Chinese idioms into structured explicit relationships, thereby making metaphorical meanings more understandable. Then, we propose a mixture of idiom experts consisting of a literal idiom expert and a metaphorical idiom expert. Through division of labor and collaboration among these experts, we achieve an understanding of the dual meanings of Chinese idioms across different scenarios. Finally, we employ the maximum mean discrepancy to adjust the variance between the literal and metaphorical semantic features of Chinese idioms. By mapping these features onto a shared reproducing kernel Hilbert space, the model can better distinguish between the two based on contextual clues. Extensive experiments demonstrate that MePe achieves state-of-the-art performance on the MChIRC dataset.