ACL2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Engsiong Chng

15 citations

Abstract

Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely GenTranslate, which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the diverse N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypothesestranslation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model 1 . & MT), hold significant practical importance for 041 global communication. Similar to other NLP 042 tasks, translation tasks also gain a notable progress 043 thanks to the recent advancement of LLMs (Zhang 044 et al., 2023a; Lyu et al., 2023). In the domain of 045 speech translation, Whisper (Radford et al., 2023) 046 demonstrates superior performance by collecting 047 680k-hour data for web-scale model training. Au-048 dioPaLM2 (Rubenstein et al., 2023) integrates both 049 text-and speech-based language models into a uni-050 fied architecture to process and generate text and 051 speech, thereby augmenting speech translation per-052 formance to a great extent. On the other hand, 053 LLMs also show remarkable ability in machine 054 translation. NLLB (Costa-jussà et al., 2022) is the 055 first to extend LLMs' linguistic capability to over 056 200 languages. BigTranslate (Yang et al., 2023b) is 057 finetuned on LLaMA (Touvron et al., 2023a) with 058 multilingual instruction tuning, which achieves 059 comparable performance to ChatGPT (OpenAI, 060 2022) and Google Translate. Most recent work 061 proposes SeamlessM4T (Barrault et al., 2023a), 062 a foundational multilingual and multitask model 063 2 Related Work 113 2.1 Large Language Models 114 There is recently a surge of research interests in 115 Transformer-based large language models, such as 116 ChatGPT (OpenAI, 2022), GPT-4 (OpenAI, 2023) 117 and LLaMA (Touvron et al., 2023a,b). Benefiting 118 from the giant model size and oceans of training 119 data, LLMs can understand better the linguistic 120 structures and semantic meanings behind raw text, 121 which thus shows remarkable performance on a 122 wide range of natural language processing (NLP) 123