ACL2024

Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Lee, Jungseul Ok

Abstract

The evaluation of summary quality encompasses diverse dimensions such as consistency, coherence, relevance, and fluency. However, existing summarization methods often target a specific dimension, facing challenges in generating well-balanced summaries across multiple dimensions. In this paper, we propose multiobjective reinforcement learning tailored to generate balanced summaries across all four dimensions. We introduce two multi-dimensional optimization (MDO) strategies for adaptive learning: 1) MDO min , rewarding the current lowest dimension score, and 2) MDO pro , optimizing multiple dimensions similar to multi-task learning, resolves conflicting gradients across dimensions through gradient projection. Unlike prior ROUGE-based rewards relying on reference summaries, we use a QA-based reward model that aligns with human preferences. Further, we discover the capability to regulate the length of summaries by adjusting the discount factor, seeking the generation of concise yet informative summaries that encapsulate crucial points. Our approach achieved substantial performance gains compared to baseline models on representative summarization datasets, particularly in the overlooked dimensions. * Equal contribution That this Act may be cited as the ``Federal Forage Fee Act of 1993''. SECTION 1. FINDINGS. (a) Findings.--Congress finds and declares that--(1) it is in the national interest that the public lands are producing and continue to produce water and soil conservation benefits, livestock forage, wildlife forage and recreation and other multiple use opportunities; (2) rangelands will continue to be … The results of the updated survey shall be incorporated into the calculation of the Non Fee Cost Differential as they become available.