Abstract:In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.
Abstract:Bayesian graphical models have been shown to be a powerful tool for discovering uncertainty and causal structure from real-world data in many application fields. Current inference methods primarily follow different kinds of trade-offs between computational complexity and predictive accuracy. At one end of the spectrum, variational inference approaches perform well in computational efficiency, while at the other end, Gibbs sampling approaches are known to be relatively accurate for prediction in practice. In this paper, we extend an existing Gibbs sampling method, and propose a new deterministic Heron inference (Heron) for a family of Bayesian graphical models. In addition to the support for nontrivial distributability, one more benefit of Heron is that it is able to not only allow us to easily assess the convergence status but also largely improve the running efficiency. We evaluate Heron against the standard collapsed Gibbs sampler and state-of-the-art state augmentation method in inference for well-known graphical models. Experimental results using publicly available real-life data have demonstrated that Heron significantly outperforms the baseline methods for inferring Bayesian graphical models.