NeurIPS2022

Lipschitz Bandits with Batched Feedback

Yasong Feng, Zengfeng Huang, Tianyu Wang

被引用 18 次

摘要

In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a <inline-formula> <tex-math notation="LaTeX">TT </tex-math></inline-formula>-step problem with Lipschitz reward of zooming dimension <inline-formula> <tex-math notation="LaTeX">dzd_{z} </tex-math></inline-formula>, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate <inline-formula> <tex-math notation="LaTeX">O~(Tdz+1dz+2)\widetilde {\mathcal {O}}\left ({T^{\frac {d_{z}+1}{d_{z}+2}}}\right) </tex-math></inline-formula> using only <inline-formula> <tex-math notation="LaTeX">O(loglogT)\mathcal {O} \left ({\log \log T}\right) </tex-math></inline-formula> batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that <inline-formula> <tex-math notation="LaTeX">Ω(loglogT)\Omega (\log \log T) </tex-math></inline-formula> batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.