NeurIPS2021
Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
Samuel Daulton, Maximilian Balandat, Eytan Bakshy
被引用 228 次
摘要
Optimizing multiple competing black-box objectives is a challenging problem in many fields, including science, engineering, and machine learning. Multi-objective Bayesian optimization (MOBO) is a sample-efficient approach for identifying the optimal trade-offs between the objectives. However, many existing methods perform poorly when the observations are corrupted by noise. We propose a novel acquisition function, NEHVI, that overcomes this important practical limitation by applying a Bayesian treatment to the popular expected hypervolume improvement (EHVI) criterion and integrating over this uncertainty in the Pareto frontier. We argue that, even in the noiseless setting, generating multiple candidates in parallel is an incarnation of EHVI with uncertainty in the Pareto frontier and therefore can be addressed using the same underlying technique. Through this lens, we derive a natural parallel variant, qNEHVI, that reduces computational complexity of parallel EHVI from exponential to polynomial with respect to the batch size. qNEHVI is one-step Bayes-optimal for hypervolume maximization in both noisy and noiseless environments, and we show that it can be optimized effectively with gradient-based methods via sample average approximation. Empirically, we demonstrate not only that qNEHVI is substantially more robust to observation noise than existing MOBO approaches, but also that it achieves state-of-the-art optimization performance and competitive wall-times in large-batch environments. functions. Multi-objective Bayesian optimization (MOBO), which combines a Bayesian surrogate with an acquisition function designed for MOO, provides a much more sample-efficient alternative. Related Work Methods based on hypervolume improvement (HVI) seek to expand the volume of the objective space dominated by the Pareto frontier. Expected hypervolume improvement (EHVI) [16] is a natural extension of the popular expected improvement (EI) [29] acquisition function to the MOO setting. Recent work has led to efficient computational paradigms using box decomposition algorithms [59] and practical enhancements such as support for parallel candidate generation and gradient-based acquisition optimization [11, 58] . However, EHVI still suffers from some limitations, including (i) the assumption that observations are noise-free, and (ii) the exponential scaling of its batch variant, qEHVI, in the batch size q, which precludes large-batch optimization. DGEMO [39] is a recent method for parallel MOBO that greedily maximizes HVI while balancing the diversity of the design points being sampled. Although DGEMO scales well to large batch sizes, it does not account for noisy observations. TSEMO [5] is a Thompson sampling (TS) heuristic that can acquire batches of points by optimizing a random fourier feature (RFF) [46] approximation of a GP surrogate using NSGA-II and selecting a subset of points from the EA's population to sequentially greedily maximize HVI. This heuristic approach for maximizing HVI currently has no theoretical guarantees and relies on zeroth-order optimization methods, which tend to be slower and exhibit worse optimization performance than gradient-based approaches. Entropy-based methods such as PESMO [25], MESMO [3], and PFES [51] are an alternative to EHVI. Of these three methods, PESMO is the only one that accounts for observation noise. However, PESMO involves intractable entropy computations and therefore relies on complex approximations, as well as challenging and time-consuming numerical optimization procedures [25] . recently proposed an extension to PESMO that supports parallel candidate generation. However, the authors of this work provide limited evaluation and have not provided code to reproduce their results. 1 MOO can also be cast into a single-objective problem by applying a random scalarization of the objectives. ParEGO maximizes the expected improvement using random augmented Chebyshev scalarizations [32] . MOEA/D-EGO [64] extends ParEGO to the batch setting using multiple random scalarizations and the genetic algorithm MOEA/D [65] to optimize these scalarizations in parallel. Recently, qParEGO, another batch variant of ParEGO was proposed that uses compositional Monte Carlo objectives and sequential greedy candidate selection [11] . Additionally, the authors proposed a noisy variant, qNParEGO, but the empirical evaluation of that variant was limited. TS-TCH [45] combines random Chebyshev scalarizations with Thompson sampling [54] , which is naturally robust to noise when the objective is scalarized. Golovin & Zhang [23] propose to use a hypervolume scalarization with the property that the expected value of the scalarization over a specific distribution of weights is equivalent to the hypervolume indicator. The authors propose a upper confidence bound algorithm using randomly sampled weights, but provide a very limited empirical evaluation. Many prior attempts by the simulation community to handle MOO