ICML2024
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
Kwang-Sung Jun, Jungtaek Kim
4 citations
Abstract
Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue for linear bandits in two respects. First, we propose a novel confidence set that is 'semi-adaptive' to the unknown sub-Gaussian parameter σ 2 * in the sense that the (normalized) confidence width scales with dσ 2 * + σ 2 0 where d is the dimension and σ 2 0 is the specified sub-Gaussian parameter (known) that can be much larger than σ 2 * . This is a significant improvement over dσ 2 0 of the standard confidence set of Abbasi-Yadkori et al. ( 2011 ), especially when d is large. We show that this leads to an improved regret bound in linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has much improved numerical performance upon prior art. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on 'regret equality' from online learning. Our empirical evaluation in diverse Bayesian optimization tasks shows that our proposed algorithms demonstrate better or comparable performance compared to existing methods. The first setup is when the noise η t | F t-1 is σ 2 * -sub-Gaussian where the sub-Gaussian parameter specified to the algorithm is σ 2 0 that can be much larger than σ 2 * . We propose a novel linear bandit algorithm called LOSAN (Linear Optimism with Semi-Adaptivity to Noise). The critical ingredient for this algorithm is a novel confidence set whose (normalized) confidence width contains online variance estimators and is no larger than Õ( dσ 2 * + σ 2 0 ) with high probability where Õ hides polylogarithmic factors. This is no worse than Õ( dσ 2 0 + σ 2 0 ) of the standard confidence 1 The implementation of our proposed methods is available at https://github.com/jungtaekkim/LOSAN-LOFAV .