ACL2025

BelarusianGLUE: Towards a Natural Language Understanding Benchmark for Belarusian

Maksim Aparovich, Volha Harytskaya, Vladislav Poritski, Oksana Volchek, Pavel Smrz

Abstract

In the epoch of multilingual large language models (LLMs), it is still challenging to evaluate the models' understanding of lowerresourced languages, which motivates further development of expert-crafted natural language understanding benchmarks. We introduce Be-larusianGLUE -a natural language understanding benchmark for Belarusian, an East Slavic language, with ≈15K instances in five tasks: sentiment analysis, linguistic acceptability, word in context, Winograd schema challenge, textual entailment. A systematic evaluation of BERT models and LLMs against this novel benchmark reveals that both types of models approach human-level performance on easier tasks, such as sentiment analysis, but there is a significant gap in performance between machine and human on a harder task -Winograd schema challenge. We find the optimal choice of model type to be task-specific: e.g. BERT models underperform on textual entailment task but are competitive for linguistic acceptability. We release the datasets 1 and evaluation code. 2