ACL2020

Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces

Goran Glavas, Ivan Vulic

45 citations

Abstract

We present INSTAMAP, an instance-based method for learning projection-based crosslingual word embeddings. Unlike prior work, it deviates from learning a single global linear projection. INSTAMAP is a non-parametric model that learns a non-linear projection by iteratively: (1) finding a globally optimal rotation of the source embedding space relying on the Kabsch algorithm, and then (2) moving each point along an instance-specific translation vector estimated from the translation vectors of the point's nearest neighbours in the training dictionary. We report performance gains with INSTAMAP over four representative state-of-the-art projection-based models on bilingual lexicon induction across a set of 28 diverse language pairs. We note prominent improvements, especially for more distant language pairs (i.e., languages with nonisomorphic monolingual spaces).