Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaitanya Chemudugunta

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Aug 07, 2008

Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

Figure 1 for Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Figure 2 for Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Figure 3 for Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Figure 4 for Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Abstract:Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selection of words to define concepts but they tend not to cover the themes in a data set exhaustively. In this paper, we propose a probabilistic framework to combine a hierarchy of human-defined semantic concepts with statistical topic models to seek the best of both worlds. Experimental results using two different sources of concept hierarchies and two collections of text documents indicate that this combination leads to systematic improvements in the quality of the associated language models as well as enabling new techniques for inferring and visualizing the semantics of a document.

Via

Access Paper or Ask Questions