ICML2024

Batch and match: black-box variational inference with a score-based divergence

Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

18 citations

Abstract

Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization. 1. Introduction. Probabilistic modeling plays a fundamental role in many problems of inference and decision-making, but it can be challenging to develop accurate probabilistic models that remain computationally tractable. In typical applications, the goal is to estimate a target distribution that cannot be evaluated or sampled from exactly, but where an unnormalized form is available. A canonical situation is applied Bayesian statistics, where the target is a posterior distribution of latent variables given observations, but where only the model's joint distribution is available in closed form. Variational inference (VI) has emerged as a leading method for fast approximate inference (Blei et al., 2017; Jordan et al., 1999; Wainwright et al., 2008) . The idea behind VI is to posit a parameterized family of approximating distributions, and then to find the member of that family which is closest to the target distribution. Recently, VI methods have become increasingly "black box," in that they only require calculation of the log of the unnormalized target and (for some algorithms) its gradients (