STOC2021
Linear bandits with limited adaptivity and learning distributional optimal design
Yufei Ruan, Jiaqi Yang, Yuan Zhou
19 citations
Abstract
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online learning and decision making. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d log d log T ) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(log d, log log T ) factors; for stochastic context vectors, even in the more restricted batch learning model, only O(log log T ) batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest. Author names are listed in alphabetical order.