ACL2024

Non-Autoregressive Machine Translation as Constrained HMM

Haoran Li, Zhanming Jie, Wei Lu

1 citation

Abstract

In non-autoregressive translation (NAT), directed acyclic Transformers (DAT) (Huang et al., 2022c) have demonstrated their ability to achieve comparable performance to the autoregressive Transformers. In this paper, we first show that DAT is essentially a fully connected left-to-right Hidden Markov Model (HMM) (Baum et al., 1970) , with the source and target sequences being observations and the token positions being latent states. Even though generative models like HMM do not suffer from label bias (Lafferty et al., 2001) in traditional task settings (e.g., sequence labeling), we argue here that the left-to-right HMM in NAT may still encounter this issue due to the missing observations at the inference stage. To combat label bias, we propose two constrained HMMs: 1) Adaptive Window HMM, which explicitly balances the number of outgoing transitions at different states; 2) Bi-directional HMM, i.e., a combination of left-to-right and right-toleft HMMs, whose uni-directional components can implicitly regularize each other's biases via shared parameters. Experimental results on WMT'14 En↔De and WMT'17 Zh↔En demonstrate that our methods can achieve better or comparable performance to the original DAT using various decoding methods. We also demonstrate that our methods effectively reduce the impact of label bias.