NeurIPS2022

Alleviating "Posterior Collapse" in Deep Topic Models via Policy Gradient

Yewen Li, Chaojie Wang, Zhibin Duan, Dongsheng Wang, Bo Chen, Bo An, Mingyuan Zhou

被引用 11 次

摘要

Deep topic models have been proven as a promising way to extract hierarchical latent representations from documents represented as high-dimensional bag-of-words vectors. However, the representation capability of existing deep topic models is still limited by the phenomenon of "posterior collapse", which has been widely criticized in deep generative models, resulting in the higher-level latent representations exhibiting similar or meaningless patterns. To this end, in this paper, we first develop a novel deep-coupling generative process for existing deep topic models, which incorporates skip connections into the generation of documents, enforcing strong links between the document and its multi-layer latent representations. After that, utilizing data augmentation techniques, we reformulate the deep-coupling generative process as a Markov decision process and develop a corresponding Policy Gradient (PG) based training algorithm, which can further alleviate the information reduction at higher layers. Extensive experiments demonstrate that our developed methods can effectively alleviate "posterior collapse" in deep topic models, contributing to providing higher-quality latent document representations. Recently, benefiting from the development of deep neural networks (DNNs), there has been an emerging research interest to develop neural topic models (NTMs) to boost the performance, efficiency, and usability of topic modeling with DNNs. Specifically, following the framework of variational autoencoder (VAE) [10], most NTMs [11, 12, 13 ] construct a variational inference network (encoder) to project each document into its stochastic latent representation, and then reconstruct the corresponding BoW observation with a stochastic/deterministic decoder. By modeling the inference/generative process with DNNs, these NTMs are more flexible and scalable than traditional Bayesian PTMs, contributing to performing large-scale downstream tasks, especially in NLP tasks [14, 15] .