NeurIPS2025

Precise Asymptotics and Refined Regret of Variance-Aware UCB

Yingying Fan, Yuxuan Han, Jinchi Lv, Xiaocong Xu, Zhengyuan Zhou

2 citations

Abstract

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decisionmaking process. More precisely, we provide an asymptotic characterization of the armpulling rates for UCB-V, extending recent results for the canonical UCB in Kalvit and Zeevi (2021) and Khamaru and Zhang (2024) . In an interesting contrast to the canonical UCB, our analysis reveals that the behavior of UCB-V can exhibit instability, meaning that the arm-pulling rates may not always be asymptotically deterministic. Besides the asymptotic characterization, we also provide non-asymptotic bounds for the arm-pulling rates in the high probability regime, offering insights into the regret analysis. As an application of this high probability result, we establish that UCB-V can achieve a more refined regret bound, previously unknown even for more complicate and advanced variance-aware online decision-making algorithms. 1. It is worth noting that a very recent companion work Han et al. (2024) to Khamaru and Zhang (2024) also provides precise regret analysis through a deterministic characterization of arm-pulling rates of the UCB algorithm for multi-armed bandits with Gaussian rewards.