AAAI2025
Nearly Tight Bounds for Exploration in Streaming Multi-Armed Bandits with Known Optimality Gap
Nikolai Karpov, Chen Wang
1 citation
Abstract
We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the a priori knowledge of the optimality gap ∆ [2] . Here, and throughout, the optimality gap ∆ [i] is defined as the mean reward gap between the best and the i-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known ) terms) is necessary and sufficient to obtain the worst-case optimal sample complexity of O(n/∆ 2 [2] ) with a single-arm memory. However, our understanding of multi-pass algorithms with known ∆ [2] is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., O( n i=2 1/∆ 2 [i] ) arm pulls, with a sublinear memory size. In this work, we show that the "right answer" for the question is Θ(log n) passes (up to log log n terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -a memory of o(n/polylog(n)) arms -and arm pulls has to make Ω( log n log log n ) passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of ∆ [2] , finds the best arm with O( n i=2 1/∆ 2 [i] • log n) arm pulls and a single arm memory.