ACL2021

New Dataset and Strong Baselines for the Grammatical Error Correction of Russian

Viet Anh Trinh, Alla Rozovskaya

Abstract

Motivated by recent advancements in grammatical error correction in English and existing issues in the field, we describe a new resource, an annotated learner corpus of Russian, extracted from the Lang-8 language learning website. This new dataset is benchmarked against two grammatical error correction models that use state-of-the-art neural architectures. Results are provided on the newlycreated corpus and are compared against performance on another, existing resource. We also evaluate the contribution of the Lang-8 training data to the grammatical error correction of Russian and perform type-based analysis of the models. The expert annotations are available for research purposes.