ICLR2025
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
Giseung Park, Youngchul Sung
摘要
We propose Deep Optimistic Linear Support Learning (DOL) to solve highdimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first time that deep reinforcement learning has succeeded in learning multiobjective policies. In addition, we provide a testbed with two experiments to be used as a benchmark for deep multi-objective reinforcement learning.