EMNLP2020
EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, Preslav Nakov
7 citations
Abstract
We propose Eχαµs -a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 highquality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others. Eχαµs offers a fine-grained evaluation framework across multiple languages and subjects, which allows precise analysis and comparison of various models. We perform various experiments with existing top-performing multilingual pre-trained models and we show that Eχαµs offers multiple challenges that require multilingual knowledge and reasoning in multiple domains. We hope that Eχαµs will enable researchers to explore challenging reasoning and knowledge transfer methods and pretrained models for school question answering in various languages which was not possible before. The data, code, pre-trained models, and evaluation are available at http:// github.com/mhardalov/exams-qa .