NeurIPS2020

Stochastic Stein Discrepancies

Jackson Gorham, Anant Raj, Lester Mackey

40 citations

Abstract

Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable. However, the computation of a Stein discrepancy can be prohibitive if the Stein operator -often a sum over likelihood terms or potentials -is expensive to evaluate. To address this deficiency, we show that stochastic Stein discrepancies (SSDs) based on subsampled approximations of the Stein operator inherit the convergence control properties of standard SDs with probability 1. Along the way, we establish the convergence of Stein variational gradient descent (SVGD) on unbounded domains, resolving an open question of Liu (2017). In our experiments with biased Markov chain Monte Carlo (MCMC) hyperparameter tuning, approximate MCMC sampler selection, and stochastic SVGD, SSDs deliver comparable inferences to standard SDs with orders of magnitude fewer likelihood evaluations. Introduction Markov chain Monte Carlo (MCMC) methods [7] provide asymptotically correct sample estimates that arise in Bayesian inference, maximum likelihood estimation [20] , and probabilistic inference more broadly. However, MCMC methods often require cycling through a large dataset or a large set of factors to produce each new sample point x i . To avoid this computational burden, many have turned to scalable approximate MCMC methods [e.g. 1, 8, 14, 39, 50], which mimic standard MCMC procedures while using only a small subsample of datapoints to generate each new sample point. These techniques reduce Monte Carlo variance by delivering larger sample sizes in less time but sacrifice asymptotic correctness by introducing a persistent bias. This bias creates new difficulties for sampler monitoring, selection, and hyperparameter tuning, as standard MCMC diagnostics, like trace plots and effective sample size, rely upon asymptotic exactness. To effectively assess the quality of approximate MCMC outputs, a line of work [9, 21-23, 27, 35] developed computable Stein discrepancies (SDs) that quantify the maximum discrepancy between sample and target expectations and provably track sample convergence to the target P , even when explicit integration and direct sampling from P are intractable. SDs have since been used to compare approximate MCMC procedures [2], test goodness of fit [11, 27, 28, 34] , train generative models [40, 48] , generate particle approximations [9, 10, 19] , improve particle approximations [25, 32, 33] , compress samples [42], conduct variational inference [41] , and estimate parameters in intractable models [5] .