ACL2025

Chart Question Answering from Real-World Analytical Narratives

Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha

摘要

We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting. 1 Dataset available at: https://huggingface.co/ datasets/maevehutch/realworld-chartqa and composition of the dataset. Finally, we report some initial benchmarking results using state-ofthe-art MLLMs. How do data cases compare with respect to attribute A? Apple vs Huawei market share? → Apple's is larger Ratings between 4.6-4.8 and 4.2-4.4? → 4.6-4.8 has more Shanghai vs Beijing ridership? → Shanghai's is higher