ACL2025
Generative Error Correction for Emotion-aware Speech-to-text Translation
Zhengdong Yang, Sheng Li, Chenhui Chu
Abstract
This paper explores emotion-aware speech-to-text translation (ST) using generative error correction (GER) by large language models (LLMs). Despite recent advancements in ST, the impact of the emotional content has been overlooked. First, we enhance the translation of emotional speech by adopting the GER paradigm: Finetuned an LLM to generate the translation based on the decoded 𝑁-best hypotheses. Next, we combine the emotion labels into the LLM finetuning process to enable the model to consider the emotion content. Experiments show that GER and the integration of emotion labels are effective on the English-Japanese language pair. This research lays the foundation for more sophisticated models that consider emotional nuances in speech.