AAAI2025

Multiple Trade-offs: An Improved Approach for Lexicographic Linear Bandits

Bo Xue, Xi Lin, Xiaoyuan Zhang, Qingfu Zhang

4 citations

Abstract

This paper studies lexicographic online learning within the framework of multiobjective stochastic linear bandits (MOSLB), where the agent aims to simultaneously maximize multiple objectives in a hierarchical manner. Previous literature has investigated lexicographic online learning in multiobjective multi-armed bandits, a special case of MOSLB. They provided a suboptimal algorithm whose regret bound is approximately O(T^(2/3)) based on a priority-based regret metric. In this paper, we propose an algorithm for lexicographic online learning in the MOSLB model, achieving an almost optimal regret bound of approximately O(dT^(1/2)) when evaluated by the general regret metric. Here, d is the dimension of arm vectors, and T is the time horizon. Our method introduces a new arm filter and a multiple trade-offs approach to effectively balance exploration and exploitation across different objectives. Experiments confirm the merits of our algorithms and provide compelling evidence to support our analysis.