NeurIPS2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Julia Olkhovskaya, Jack J. Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei

被引用 17 次

摘要

We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of KK arms to change over time without restriction. Assuming the dd-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of TT rounds is known to scale as O~(KdT)\tilde O(\sqrt{Kd T}). Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order O~(KdVT)\tilde O(K\sqrt{d V_T}) in terms of the cumulative second moment of the learner's losses VTV_T, and a closely related first-order bound of order O~(KdLT)\tilde O(K\sqrt{d L_T^*}) in terms of the cumulative loss of the best policy LTL_T^*. Since VTV_T or LTL_T^* may be significantly smaller than TT, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.