NeurIPS2022

Lipschitz Bandits with Batched Feedback

Yasong Feng, Zengfeng Huang, Tianyu Wang

被引用 18 次

摘要

In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a <inline-formula> <tex-math notation="LaTeX"> $T$ </tex-math></inline-formula>-step problem with Lipschitz reward of zooming dimension <inline-formula> <tex-math notation="LaTeX"> $d_{z}$ </tex-math></inline-formula>, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate <inline-formula> <tex-math notation="LaTeX"> $\widetilde {\mathcal {O}}\left ({T^{\frac {d_{z}+1}{d_{z}+2}}}\right)$ </tex-math></inline-formula> using only <inline-formula> <tex-math notation="LaTeX"> $\mathcal {O} \left ({\log \log T}\right)$ </tex-math></inline-formula> batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that <inline-formula> <tex-math notation="LaTeX"> $\Omega (\log \log T)$ </tex-math></inline-formula> batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.