ICML2025

Triple-Optimistic Learning for Stochastic Contextual Bandits with General Constraints

Hengquan Guo, Lingkai Zu, Xin Liu

摘要

We study contextual bandits with general constraints, where a learner observes contexts and aims to maximize cumulative rewards while satisfying a wide range of general constraints. We introduce the Optimistic 3 framework, a novel learning and decision-making approach that integrates optimistic design into parameter learning, primal decision, and dual violation adaptation (i.e., triple-optimism), combined with an efficient primal-dual architecture. Optimistic 3 achieves Õ( √ T ) regret and constraint violation for contextual bandits with general constraints. This framework not only outperforms the stateof-the-art results that achieve Õ(T 3 4 ) guarantees when Slater's condition does not hold but also improves on previous results that achieve Õ( √ T /δ) when Slater's condition holds (δ denotes the Slater's condition parameter), offering a O(1/δ) improvement. Note this improvement is significant because δ can be arbitrarily small when constraints are particularly challenging. Moreover, we show that Optimistic 3 can be extended to classical multi-armed bandits with both stochastic and adversarial constraints, recovering the best-of-both-worlds guarantee established in the state-of-the-art works, but with significantly less computational overhead.