NeurIPS2022
Lipschitz Bandits with Batched Feedback
Yasong Feng, Zengfeng Huang, Tianyu Wang
被引用 18 次
摘要
In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a <inline-formula> <tex-math notation="LaTeX"> </tex-math></inline-formula>-step problem with Lipschitz reward of zooming dimension <inline-formula> <tex-math notation="LaTeX"> </tex-math></inline-formula>, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate <inline-formula> <tex-math notation="LaTeX"> </tex-math></inline-formula> using only <inline-formula> <tex-math notation="LaTeX"> </tex-math></inline-formula> batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that <inline-formula> <tex-math notation="LaTeX"> </tex-math></inline-formula> batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.