EMNLP2025
Probing Logical Reasoning of MLLMs in Scientific Diagrams
Yufei Wang, Adriana Kovashka
摘要
We examine how multimodal large language models (MLLMs) perform logical inference grounded in visual information. We first construct a dataset of food web/chain images, along with questions that follow seven structured templates with progressively more complex reasoning involved. We show that complex reasoning about entities in the images remains challenging (even with elaborate prompts) and that visual information is underutilized.