ACL2023

Hybrid-Regressive Paradigm for Accurate and Speed-Robust Neural Machine Translation

Qiang Wang, Xinhui Hu, Ming Chen

Abstract

This study provides empirical evidence that non-autoregressive translation (NAT) is less robust in decoding batch size and hardware settings than autoregressive translation (AT). To address this issue, we demonstrate that incorporating a small number of AT predictions can significantly reduce the performance gap between AT and NAT through synthetic experiments. In line with this, we propose hybridregressive translation (HRT), a two-stage translation prototype that combines the strengths of AT and NAT. Specifically, HRT initially generates discontinuous sequences using autoregression (e.g., making predictions for every k tokens, k > 1), and then fills in all previously skipped tokens simultaneously in a nonautoregressive manner. Experimental results on five translation tasks show that HRT achieves comparable translation quality to AT while providing at least 1.5x faster inference, irrespective of batch size and device. Moreover, HRT successfully retains the desirable characteristics of AT in the deep-encoder-shallow-decoder architecture, enabling further speed improvements without sacrificing BLEU scores. 1