EMNLP2022

Should We Ban English NLP for a Year?

Anders Søgaard

被引用 17 次

摘要

Around two thirds of NLP research at top venues is devoted exclusively to developing technology for speakers of English, most speech data comes from young urban speakers, and most texts used to train language models come from male writers. These biases feed into consumer technologies to widen existing inequality gaps, not only within, but also across, societies. Many have argued that it is almost impossible to mitigate inequality amplification. I argue that, on the contrary, it is quite simple to do so, and that counter-measures would have little-to-no negative impact, except for, perhaps, in the very short term. 1 See Bender ( 2009 ), Ruder et al. ( 2022 ), as well as https: //sjmielke.com/acl-language-diversity.htm 2 What explains the dominance of English in NLP research? Prestige seems to be an important factor. What is considered state of the art, is what achieves best performance on English. Consider, for example, the ACL Wiki's list of 'state-of-the-art' part-of-speech taggers