NeurIPS2020

Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

Benjamin Letham, Roberto Calandra, Akshara Rai, Eytan Bakshy

140 citations

Abstract

Bayesian optimization (BO) is a popular approach to optimize expensive-toevaluate black-box functions. A significant challenge in BO is to scale to highdimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lowerdimensional manifold, often via a random linear embedding. In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO. We study the properties of linear embeddings from the literature and show that some of the design choices in current approaches adversely impact their performance. We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion. Problem Framework and REMBO In this section, we define the problem framework and notation, and then describe REMBO, along with known challenges and follow-up work that has been proposed to address those issues. Bayesian Optimization We consider the problem min x∈B f (x) where f is a black-box function and B are box bounds. We assume gradients of f are unavailable. The box bounds on x specify the range of values that are reasonable or physically possible to evaluate. For instance, [18] used BO for an environmental remediation problem in which each x i represents a pumping rate of a particular pump, which has physical limitations. The problem may also include nonlinear constraints c j (x) ≤ 0 where each c j is itself a black-box function. BO is a form of sequential model-based optimization, where we fit a surrogate model for f that is used to identify which parameters x should be evaluated next. The surrogate model is typically a GP, f ∼ GP(m(•), k(•, •)), with mean function m(•) and a kernel k(•, •). Under the GP prior, the posterior for the value of f (x) at any point in the space is a normal distribution with closed-form mean and variance. Using that posterior, we construct an acquisition function α(x) that specifies the utility of evaluating f at x, such as Expected Improvement (EI) [25] . We find x * ∈ arg max x∈B α(x), and in the next iteration evaluate f (x * ). GPs are useful for BO because they provide a well-calibrated posterior in closed form. With many kernels and acquisition functions, α(x) is differentiable and can be efficiently optimized. However, typical kernels like the ARD RBF kernel have significant limitations. GPs are known to predict poorly for dimension D larger than 15-20 [49, 31, 37], which prevents the use of standard BO in high dimensions. In HDBO, the objective f : R D → R operates in a high-dimensional (D) space, which we call the ambient space. When using linear embeddings for HDBO, we assume there exists a low-dimensional linear subspace that captures all of the variation of f . Specifically, let f d : R d → R, d D, and let T ∈ R d×D be a projection from D down to d dimensions. The linear embedding assumption is that f (x) = f d (T x) ∀x ∈ R D . T is unknown, and we only have access to f , not f d . We assume, without any loss of generality, that the box bounds are B = [-1, 1] D ; the ambient space can always be scaled to these bounds.