NeurIPS2020

Bayesian Probabilistic Numerical Integration with Tree-Based Models

Harrison Zhu, Xing Liu, Ruya Kang, Zhichao Shen, Seth R. Flaxman, François-Xavier Briol

被引用 7 次

摘要

Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting very high-dimensional or non-smooth target functions. This paper proposes to tackle this issue with a new Bayesian numerical integration algorithm based on Bayesian Additive Regression Trees (BART) priors, which we call BART-Int. BART priors are easy to tune and well-suited for discontinuous functions. We demonstrate that they also lend themselves naturally to a sequential design setting and that explicit convergence rates can be obtained in a variety of settings. The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem. 1 χ k : X → R is an indicator function taking value 1 whenever x ∈ χ k and 0 otherwise. Similarly to decision trees and random forest, a single tree comprising of a large number of leafs might overfit to the training data. As a result, it is common to use an ensemble of shallow trees. We will call T -additive regression tree any function which takes the form of a sum of regression trees: g E,B (x) := T t=1 g Tt,βt (x), where E := T t T t=1 and B := β t T t=1 . Finally, we call Bayesian additive regression tree (BART) any distribution on the family of T -additive regression trees. Such a distribution can be constructed by specifying a (prior) distribution on the partition E and leaf values B. BART is hence a stochastic process whose sample space Ω consists of the product space of K t -partitions of X and R Kt . For simplicity, BART is usually restricted to an approximation domain X = [0, 1] d , but this is not a requirement in full generality. Denote by p the density of this prior distribution. We will follow the majority of the BART literature [13; 14; 95] and use prior models which factorise in the following way: p(E, B) := T t=1 p(T t )p(β t |T t ) and p(β t |T t ) := Kt k=1 N (β t,k |0, 1/(16T )), where T is the number of trees and K t is the number of leaves in the t th tree. The construction of the distribution on partitions is usually itself done via a tree generating stochastic process; see further details in [14] (and a full description in Appendix A). Given this prior P on T -additive trees, we can condition on (X, y) to obtain a posterior P n (with density p n ). We focus on regression with i.i.d. Gaussian noise, but several generalisations exist