EMNLP2025

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments

David Beauchemin, Richard Khoury

Abstract

Large language models (LLM) perform outstandingly in various downstream tasks. However, there is limited understanding regarding how these models internalize linguistic knowledge, so various linguistic benchmarks have recently been proposed to facilitate syntactic evaluation of language models (LM) across languages. This paper introduces QFrCoLA (Quebec-French Corpus of Linguistic Acceptability Judgments), a normative binary acceptability judgments dataset comprising 25,153 in-domain and 2,675 out-of-domain sentences. Our study leverages the QFrCoLA dataset and seven other linguistic binary acceptability judgments corpus to benchmark eight LM. The results demonstrate that, on average, finetuned Transformer-based LM are strong baselines for most languages and that zero-shot binary classification LLM perform worse than the naive baseline on the task. However, for the QFrCoLA benchmark, on average, a finetuned Transformer-based LM outperformed other methods tested. It also shows that pretrained cross-lingual LLMs selected for our experimentation do not seem to have acquired linguistic judgment capabilities during their pre-training for Quebec French. Finally, our experiment results on QFrCoLA show that our dataset, built from examples that illustrate linguistic norms rather than speakers' feelings, is similar to linguistic acceptability judgment; it is a challenging dataset that can benchmark LM on their linguistic judgment capabilities.