NeurIPS2021

Optimal Order Simple Regret for Gaussian Process Bandits

Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, Da-Shan Shiu

62 citations

Abstract

Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function f . The problem can be cast as a Gaussian Process (GP) bandit where f lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When N is the number of exploration trials and γ N is the maximal information gain, we prove an Õ( γ N /N ) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest. d+1 d+2 ) cumulative regret. We do not compare with these results due to the inherent difference in the regularity assumptions. Organization In § 2, the problem formulation, the regularity assumptions, and the preliminaries on RKHS and GP models are presented. The novel confidence intervals for GP models are proven in § 3. MVR