NeurIPS2023

Asymptotically Optimal Quantile Pure Exploration for Infinite-Armed Bandits

Evelyn Xiao-Yue Gong, Mark Sellke

被引用 2 次

摘要

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability 1 − δ , within ε of being with the top η -fraction of arms. For fixed confidence, we give an algorithm with expected sample complexity O (cid:16) log(1 /δ )log(1 /η ) ηε 2 (cid:17) which matches a known lower bound up to the log(1 /η ) factor. In particular the δ -dependence is optimal and closes a quadratic gap. For fixed budget, we show the asymptotically optimal sample complexity as δ → 0 is log(1 /δ ) (cid:0) log log(1 /δ ) (cid:1) 2 /c . The value of c depends explicitly on the problem parameters (including the unknown arm distribution) through a certain Fisher information distance. Even the strictly super-linear dependence on log(1 /δ ) was not known and resolves a question of [GM20].