ACL2021

Sequence Models for Computational Etymology of Borrowings

Winston Wu, Kevin Duh, David Yarowsky

Abstract

We computationally model the processes of word borrowing from a donor word to an incorporated word, and vice versa, by answering two questions: (1) what does a word look like incorporated into another language, and in the opposite direction (2) where did a word come from? We employ neural sequence models, focusing on six specific borrowing relations: calques, partial calques, semantic loans, phono-semantic matches, transliterations, and generic borrowings. We experiment with several model variants, including LSTM encoderdecoders, copy attention, and Transformers. In both directions, we find that an LSTM model can beat strong baselines, with the quantity of data strongly influencing model performance.