ACL2022

Non-Autoregressive Sequence Generation

Jiatao Gu, Xu Tan

Abstract

State-of-the-art sequence generation models are mostly autoregressive (AR, Vaswani et al., 2017; Brown et al., 2020) where each generation step depends on the previously generated tokens. However, such models are inherently sequential, leading to high latency at inference time and suffering label bias (Lafferty et al., 2001) problem due to the locally normalized searching steps and exposure bias (Bengio et al., 2015) problem due to mismatch between training and inference. Recently, increasing attention has been paid to modeling sequence generation in a non-or semiautoregressive manner, which attempts to generate the entire or partial output sequences in parallel to speed up the decoding process and avoid potential issues (e.g., label bias, exposure bias) in autoregressive generation. In this tutorial, for simplicity, we summarize both approaches as non-autoregressive (NAR) sequence generation models. NAR models have been explored in many sequence generation tasks for text (e.g., neural machine translation (Gu et al., 2018 ), text summarization (Gu et al., 2019), text error correction (Awasthi et al., 2019; Leng et al., 2021b)), speech (e.g., speech recognition (Chen et al., 2019) and speech synthesis (Ren et al., 2019) ). However, naive NAR models still face many challenges to close the performance gap between state-of-the-art autoregressive models because of a lack of modeling power. This tutorial will provide a thorough introduction and review of the basics of non-autoregressive sequence generation, including the background, the capabilities, and limits, popular methods that improve NAR models, and their applications on text and speech generation.