ICML2025

LOGO - Long cOntext aliGnment via efficient preference Optimization

Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang

Abstract

Long-context models (LCMs) have shown great potential in processing long sequences, with research showing they can accurately locate tokenlevel salient information. Yet, the generation performance of these LCMs is far from satisfactory and might result in misaligned responses, such as hallucinations. To enhance the generation capability, existing works have investigated the effects of data size and quality for both pre-training and instruction tuning stages. Though achieving meaningful improvement, previous methods fall short in either effectiveness or efficiency. In this paper, we introduce LOGO, an efficient and effective training strategy that first introduces preference optimization for long-context alignment. LOGO consists of a reference-free preference optimization strategy and a corresponding efficient data synthesis process. By training with only 0.3B data on a single 8×A800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K model to achieve comparable performance with GPT-4 in real-world long-context tasks while preserving the model's original capabilities on other tasks, e.g., language modeling and MMLU. Besides, LOGO can also scale the models' context window size while enhancing their performance.