Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shishuang He

Learning Topic Models: Identifiability and Finite-Sample Analysis

Oct 08, 2021

Yinyin Chen, Shishuang He, Yun Yang, Feng Liang

Figure 1 for Learning Topic Models: Identifiability and Finite-Sample Analysis

Figure 2 for Learning Topic Models: Identifiability and Finite-Sample Analysis

Figure 3 for Learning Topic Models: Identifiability and Finite-Sample Analysis

Figure 4 for Learning Topic Models: Identifiability and Finite-Sample Analysis

Abstract:Topic models provide a useful text-mining tool for learning, extracting and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets.

Via

Access Paper or Ask Questions