AAAI2024

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Bo Xue, Ji Cheng, Fei Liu, Yimu Wang, Qingfu Zhang

被引用 4 次

摘要

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize m objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT ) 2/3 ) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multiarmed bandits is Ω(K log T ). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T (d i z +1)/(d i z +2) ) for the i-th objective, where d i z is the zooming dimension for the i-th objective, with i ∈ 1, 2, . . . , m. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T , indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.