AAAI2025

On the Asymptotic Optimality of Confidence Interval Based Algorithms for Fixed Confidence MABs

Kushal Kejriwal, Nikhil Karamchandani, Jayakrishnan Nair

摘要

We address the classical problem of constructing confidence intervals (CIs) for the mean of a distribution, given N i.i.d. samples, such that the CI contains the true mean with probability at least 1 -δ, where δ ∈ (0, 1). We characterize three distinct learning regimes based on the minimum achievable limiting width of any CI as the sample size N δ → ∞ and δ → 0. In the first regime, where N δ grows slower than log(1/δ), the limiting width of any CI equals the width of the distribution's support, precluding meaningful inference. In the second regime, where N δ scales as log(1/δ), we precisely characterize the minimum limiting width, which depends on the scaling constant. In the third regime, where N δ grows faster than log(1/δ), complete learning is achievable, and the limiting width of the CI collapses to zero and CI converges to the true mean. We demonstrate that CIs derived from concentration inequalities based on Kullback-Leibler (KL) divergences achieve asymptotically optimal performance, attaining the minimum limiting width in both the sufficient and the complete learning regimes for distributions in three families: single-parameter exponential, bounded support and known bound on (1 + ϵ) th moment. Additionally, these results extend to one-sided CIs, with the width notion adjusted appropriately. Finally, we generalize our findings to settings with random per-sample costs, motivated by practical applications such as stochastic simulators and cloud service selection. Instead of a fixed sample size, we consider a cost budget C δ , identifying analogous learning regimes and characterizing the optimal CI construction policy.