EMNLP2024

MiTTenS: A Dataset for Evaluating Gender Mistranslation

Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn Bastings

Abstract

Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslations, and such errors create potential for harm. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS 1 , covering 26 languages from a variety of language families and scripts, including several traditionally underrepresented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both neural machine translation systems and foundation models, and show that all systems exhibit gender mistranslation and potential harm, even in high resource languages. 1 https://github.com/google-research-datasets/ mittens Bengali: সারা আমার খালা। আিম সিতত্যি ই তার কৗত ু ক পছন্দ English: Sarah is my aunt. I really like his jokes. German: Tacetin Guntekin war Professor. Er war bekannt für seine Bücher… English: Tacettin Güntekin was a professor. She was known for her books… Spanish: Vino de inmediato cuando se enteró. Es una buena médica. English: He came immediately when he heard about it. He is a good doctor.