ICML2023

Bayesian Design Principles for Frequentist Sequential Learning

Yunbei Xu, Assaf Zeevi

19 citations

Abstract

We develop a general theory to optimize the frequentist regret for sequential learning problems, from which efficient bandit and reinforcement learning algorithms can be derived via unified Bayesian principles. Building on the recent Decision-Estimation Coefficient (DEC) framework, we propose a novel optimization approach to generate “algorithmic beliefs” at each round and use Bayesian posteriors for decision-making. The optimization objective, termed “Algorithmic Information Ratio” (AIR), represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. Although AIR’s minimax regret aligns with that provided by DEC, it additionally offers an algorithm-dependent perspective–distinct from a minimax complexity–facilitating algorithm design and analysis. Specifically, AIR enables deriving explicit algorithms via belief parameterization and provides clear approximation guidelines with provable guarantees. Moreover, the resulting algorithms have a simple structure and are computationally efficient for several representative problems. We illustrate our framework with a novel algorithm for multi-armed bandits that performs strongly across stochastic, adversarial, and non-stationary environments, and demonstrate applicability to linear bandits, convex bandits, and reinforcement learning.