ACL2021

Benchmarking Neural Topic Models: An Empirical Study

Thanh-Nam Doan, Tuan-Anh Hoang

摘要

Neural topic modeling approach has been attracting much attention recently as it is able to leverage the advantages of both neural networks and probabilistic topic models. Previous works have proposed several models that are based on this framework and obtained impressive experimental results compared to traditional probabilistic models. However, the reported result is not consistent across the works, making them hard for gaining a rigorous assessment of these approaches. This work aims to address this issue by offering an extensive empirical evaluation of typical neural topic models in different aspects using large, diverse datasets as well as a thorough set of metrics. Precisely, we examine the performance of these models in three tasks, namely uncovering cohesive topics, modeling the input documents, and representing them for downstream classification. Our results show that while the neural topic models are better in the first and the third tasks, the traditional probabilistic models are still a strong baseline and are better in the second task in many cases. These findings give us more insights for choosing offthe-shelf topic modeling toolboxes in different contexts, as well as for designing more comprehensive evaluation for neural topic models.