AAAI2026

M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs

Tianlong Zheng, Yating Yang, Rui Dong, Bo Ma, Lei Wang, Xi Zhou, Siru Miao, Osman Turghun

Abstract

Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases. The inapt paraphrases were carefully selected to serve as control to determine whether the model indeed performs full metaphor interpretation or rather resorts to lexical similarity. All apt and inapt paraphrases were manually annotated. The metaphorical sentences cover natural metaphor uses across 4 genres (academic, news, fiction, and conversation), and they exhibit different levels of novelty. Experiments with LLaMA and GPT-3.5 demonstrate that MUNCH presents a challenging task for LLMs. comprehend metaphor-a fundamental linguistic 042 and cognitive tool-is still poorly understood. 043 Metaphors are linguistic expressions based on 044 conceptual mappings between a target and a source 045 domain (Lakoff and Johnson, 1980). The verb 046 phrase to stir excitement, for example, is based on 047 the conceptual metaphor FEELING IS LIQUID, with 048 FEELING (excitement) being the target domain and 049 LIQUID (something that can be stirred) the source 050 domain. The metaphor compares FEELING with 051 LIQUID, introducing vividness into the description 052 of an otherwise intangible emotional impact. Such 053 cross-domain mappings are sets of systematic on-054 tological correspondences, mapping concepts and 055 their relational structure across distinct domains.