Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

William M. Darling

Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA

Mar 12, 2013

William M. Darling, Fei Song

Figure 1 for Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA

Figure 2 for Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA

Figure 3 for Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA

Figure 4 for Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA

Abstract:This article presents a probabilistic generative model for text based on semantic topics and syntactic classes called Part-of-Speech LDA (POSLDA). POSLDA simultaneously uncovers short-range syntactic patterns (syntax) and long-range semantic patterns (topics) that exist in document collections. This results in word distributions that are specific to both topics (sports, education, ...) and parts-of-speech (nouns, verbs, ...). For example, multinomial distributions over words are uncovered that can be understood as "nouns about weather" or "verbs about law". We describe the model and an approximate inference algorithm and then demonstrate the quality of the learned topics both qualitatively and quantitatively. Then, we discuss an NLP application where the output of POSLDA can lead to strong improvements in quality: unsupervised part-of-speech tagging. We describe algorithms for this task that make use of POSLDA-learned distributions that result in improved performance beyond the state of the art.

* Currently under review for the journal Computational Linguistics

Via

Access Paper or Ask Questions