ACL2024

Efficient Domain Adaptation for Non-Autoregressive Machine Translation

Wangjie You, Pei Guo, Juntao Li, Kehai Chen, Min Zhang

Abstract

Domain adaptation remains a challenge in the realm of Neural Machine Translation (NMT), even in the era of Large language models (LLMs). Existing non-parametric approaches like nearest neighbor machine translation have made small Autoregressive Translation (AT) models achieve efficient domain generalization and adaptation without updating parameters, but leaving the Non-Autoregressive Translation (NAT) counterparts under-explored. To fill this blank, we introduce Bi-kNN, an innovative and efficient domain adaptation approach for NAT models that tailors a k-Nearest-Neighbor algorithm for NAT. Specifically, we introduce an effective datastore construction and correlated updating strategies to conform the parallel nature of NAT. Additionally, we train a metanetwork that seamlessly integrates the kNN distribution with the NMT distribution robustly during the iterative decoding process of NAT. Our experimental results across four benchmark datasets demonstrate that our Bi-kNN not only achieves significant improvements over the Base-NAT model (7.8 BLEU on average) but also exhibits enhanced efficiency. All the implementation details of this work will be publicly accessible at https://anonymous/ . et al., 2023; Jiao et al., 2023) have highlighted that 042 while LLMs demonstrate impressive translation 043 capabilities for mainstream languages, their perfor-044 mance significantly declines when confronted with 045 specific domains. How to deal with the NMT task 046 of specific domains in the era of LLMs, i.e., the 047 domain adaptation setting, is still not well-known. 048 To explore this problem further, we first briefly 049 compare different lines of possible solutions for 050 NMT tasks, including closed-sourced LLM (Chat-051 GPT), open-sourced LLM (LLaMA-2), and two 052 types of transformer-based expert models (i.e., 053 the auto-regressive (AT) and non-auto-regressive 054 (NAT) fashion). 1 LLMs are introduced to calibrate 055 the domain-specific translation performance with-056 out any further task training due to possible pro-057 hibitive costs. Expert models are presented to learn 058 the domain adaptation capabilities of small capacity 059 models with supervised task training, e.g., initially 060 converged on the WMT training dataset but infer-061 ence on specific domains like IT, Medical, Law, 062 and Koran. Table 1 averages the BLEU scores 063 on four specific domains. We can see that though 064 supervised task expert models no longer have per-065 formance superiority to LLMs on high-resource 066 languages, they still show promising domain adap-067 tation potential. Meanwhile, considering that ex-068 pert models are much more affordable in real-world 069 usage at scale (e.g., over 10 × speedup), it is worth figuring out how to solve NMT for specific domains 071 with small experts. 072 There are already some effective strategies to en-073 hance the domain adaptation performance of small 074 models, particularly the non-parametric paradigm. 075 For instance, Khandelwal et al. (2020) present a 076 k-Nearest-Neighbor Machine Translation method, 077 which utilizes a trained NMT model to construct a 078 datastore, consisting of (query: context represen-079 tations; value: the correlated target tokens) pairs 080 in the training set, and then retrieves relevant to-081 kens during inference to enhance the translation 082 accuracy. This non-parametric approach equips the 083 model with rapid domain adaptation and generaliza-084 tion abilities without the need for parameter adjust-085 ments. However, most of the existing methods are 086 tailored for AT models, leaving the domain adapta-087 tion problem of NAT models under-explored. 088 To fill this blank, we introduce an innovative do-089 main adaptation approach for NAT models, namely 090 Bidirectional-Iterative-knn (Bi-kNN), which is an 091 efficient method tailored for NAT models. Unlike 092 k-Nearest-Neighbor NMT for AT models, NAT 093 models struggle with producing accurate represen-094 tations due to insufficient context with parallel de-095 coding. To overcome this, we present a novel and 096 effective framework for kNN-MT with NAT mod-097 els, including (1) building a bidirectional datastore, 098 (2) renewing the indecipherable datastore, (3) train-099 ing a robust Meta-network, and (4) iterative-kNN 100 decoding. We conducted experiments on multi-101 ple domain-specific NMT tasks. Across four do-102 mains, our approach achieved an average of 7.8 103 BLEU score improvement for the Base-NAT mod-104 els and outperformed the specialized models which 105 are trained on the corresponding datasets on most 106 datasets. Furthermore, without tuning the parame-107 ters of pre-trained models, our method proved more 108 efficient and avoided catastrophic forgetting com-109 pared to the straightforward fine-tuning method. 110 2 Related Work 111 Machine Translation with LLMs Large Lan-112 guage Models (LLMs), notably ChatGPT (Ouyang 113 et al., 2022) and GPT-4 (Achiam et a