ICML2023

Optimal Arms Identification with Knapsacks

Shaoang Li, Lan Zhang, Yingqi Yu, Xiangyang Li

被引用 5 次

摘要

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis. Keywords: multi-armed bandits, best arm identification, MDL. 1. Optimality is mentioned in several articles, with different and sometimes weak meanings (minimax, rate-optimal,...). In our view, BAI algorithms for which there exists a model with a sample complexity bounded, up to a multiplicative constant, by some quantity related to some lower bound, may not be called optimal.