ICML2021

Optimal Streaming Algorithms for Multi-Armed Bandits

Tianyuan Jin, Keke Huang, Jing Tang, Xiaokui Xiao

16 citations

Abstract

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of nn arms with reward distributions supported on [0,1][0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming -toptop-kk arms identification problem, which asks for kk arms whose reward means are lower than that of the kk-th best arm by at most \eps\eps with probability at least 1δ1-\delta. For general \eps(0,1)\eps \in (0,1), the existing solution for this problem assumes k=1k = 1 and achieves the optimal sample complexity O(n\eps2log1δ)O(\frac{n}{\eps^2} \log \frac{1}{\delta}) using O(log(n))O(\log^*(n)) (log(n)\log^*(n) equals the number of times that we need to apply the logarithm function on nn before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any kk and achieves the optimal sample complexity O(n\eps2logkδ)O(\frac{n}{\eps^2} \log\frac{k}{\delta}) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1δ1-\delta probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(logΔ21)O(\log \Delta_2^{-1}) passes, where Δ2\Delta_2 is the gap between the mean of the best arm and that of the second best arm.