NeurIPS2020

Recursive Inference for Variational Autoencoders

Minyoung Kim, Vladimir Pavlovic

14 citations

Abstract

Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. To address these issues, in this paper we introduce an accurate amortized inference algorithm. We propose a novel recursive mixture estimation algorithm for VAEs that iteratively augments the current mixture with new components so as to maximally reduce the divergence between the variational and the true posteriors. Using the functional gradient approach, we devise an intuitive learning criteria for selecting a new mixture component: the new component has to improve the data likelihood (lower bound) and, at the same time, be as divergent from the current mixture distribution as possible, thus increasing representational diversity. Compared to recently proposed boosted variational inference (BVI), our method relies on amortized inference in contrast to BVI's non-amortized single optimization instance. A crucial benefit of our approach is that the inference at test time requires a single feed-forward pass through the mixture inference network, making it significantly faster than the semi-amortized approaches. We show that our approach yields higher test data likelihood than the state-of-the-art on several benchmark datasets. Introduction Accurately modeling complex generative processes for high dimensional data (e.g., images) is a key task in deep learning. In many application fields, the Variational Autoencoder (VAE) [13, 29] was shown to be very effective for this task, endowed with the ability to interpret and directly control the latent variables that correspond to underlying hidden factors in data generation, a critical benefit over synthesis-only models such as GANs [7]. The VAE adopts the inference network (aka encoder) that can perform test-time inference using a single feed-forward pass through a neural network. Although this feature, known as amortized inference, allows VAE to circumvent otherwise time-consuming procedures of solving the instance-wise variational optimization problem at test time, it often results in inaccurate posterior approximation compared to the instance-wise variational optimization [4]. Recently, semi-amortized approaches have been proposed to address this drawback. The main idea is to use an amortized encoder to produce a reasonable initial iterate, followed by instance-wise posterior fine tuning (e.g., a few gradient steps) to improve the posterior approximation [11, 14, 23, 27]. This is similar to the test-time model adaptation of the MAML [5] in multi-task (meta) learning. However, this iterative gradient update may be computationally expensive during both training and test time: for training, some of the methods require Hessian-vector products for backpropagation, while at test time, one has to perform extra gradient steps for fine-tuning the variational optimization. Moreover, the performance of this approach is often very sensitive to the choice of the gradient step size and the number of gradient updates. 34th Conference on Neural Information Processing Systems (NeurIPS 2020),