ACL2025

Enhancing Machine Translation with Self-Supervised Preference Data

Haoxiang Sun, Ruize Gao, Pei Zhang, Baosong Yang, Rui Wang

被引用 7 次

摘要

Model alignment methods like Direct Preference Optimization (Rafailov et al., 2024) and Contrastive Preference Optimization (Xu et al., 2024b) have enhanced machine translation performance by leveraging preference data to enable models to reject suboptimal outputs. During preference data construction, previous approaches primarily rely on humans, strong models like GPT4 (OpenAI, 2023) or model self-sampling. In this study, we first explain the shortcomings of this practice. Then, we propose Self-Supervised Preference Optimization (SSPO), a novel framework which efficiently constructs translation preference data for iterative DPO training. Applying SSPO to 14B parameters large language models (LLMs) achieves comparable or better performance than GPT-4o on FLO-RES and multi-domain test datasets. We release an augmented MQM dataset in https: //github.com/sunny-sjtu/MQM-aug . * Work done during internship at Tongyi Lab. † Rui Wang and Baosong Yang are co-corresponding authors. * We use gpt-4o-0806 available from the OpenAI API.