EMNLP2025

XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering

Keon-Woo Roh, Yeong-Joon Ju, Seong-Whan Lee

摘要

Large Language Models (LLMs) have shown significant progress in Open-Domain Question Answering (ODQA), yet most evaluations focus on English and assume locale-invariant answers across languages.This assumption neglects the cultural and regional variations that affect question understanding and answer, leading to biased evaluation in multilingual benchmarks.To address these limitations, we introduce XLQA, a novel benchmark explicitly designed for locale-sensitive multilingual ODQA.XLQA contains 3,000 English seed questions expanded to eight languages, with careful filtering for semantic consistency and human-verified annotations distinguishing locale-invariant and locale-sensitive cases.Our evaluation of five state-of-the-art multilingual LLMs reveals notable failures on localesensitive questions, exposing gaps between English and other languages due to a lack of locale-grounding knowledge.We provide a systematic framework and scalable methodology for assessing multilingual QA under diverse cultural contexts, offering a critical resource to advance the real-world applicability of multilingual ODQA systems.Our findings suggest that disparities in training data distribution contribute to differences in both linguistic competence and locale-awareness across models.https://github.com/ro-ko/XLQA