ICML2025
Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
Shogo Iwazaki, Shion Takeno
摘要
We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS). The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE). Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear bandit with heteroscedastic noise. For this problem, we show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound. [2022] have shown that phased elimination (PE) is near-optimal for the cumulative regret. The regret analysis of MVR and PE heavily depends on the upper bound for the maximum posterior variance. We derive the upper bound of the maximum posterior variance in Section 3, by which we tackle tightening the regret upper bound in the settings where room for improvement remains. Our contributions are summarized as follows: 1. In Section 3, we obtain the upper bound of the maximum posterior variance (Lemma 3.1 and Corollary 3.2). Our proposed bound is tighter than the existing bound when the noise variances approach zero. 2. In Section 4, we analyze the GP bandit under the noiseless setting. We show a novel result that PE achieves the cumulative regret upper bound that matches the conjectured lower bound shown by Vakili [2022] under common assumptions in the GP bandit literature. Furthermore, we prove that MVR achieves the exponentially converging and near-optimal simple regret upper bounds for squared exponential (SE) and Matérn kernels, respectively. These results are summarized in Tables 1 2 3. In Section 5, we show that the modified PE-and MVR-style algorithms achieve the near-optimal cumulative and simple regret upper bounds with respect to the RKHS norm upper bound of the reward function under several conditions. These results are summarized in Tables 3 4 5 4. In Section 6, we analyze the GP-bandit problem with the non-stationary noise variance, which is the kernelized extension of the linear bandit with heteroscedastic noise [Zhou et al., 2021]. We first study the regret lower bound. Then, we show that the modified PE-and MVR-style algorithms achieve the near-optimal cumulative and simple regret upper bounds, respectively. To our knowledge, our analyses are the first for this setting, though the non-stationary noise is a frequently faced problem.