Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

João V. Graça

Controlling Complexity in Part-of-Speech Induction

Jan 16, 2014

João V. Graça, Kuzman Ganchev, Luisa Coheur, Fernando Pereira, Ben Taskar

Figure 1 for Controlling Complexity in Part-of-Speech Induction

Figure 2 for Controlling Complexity in Part-of-Speech Induction

Figure 3 for Controlling Complexity in Part-of-Speech Induction

Figure 4 for Controlling Complexity in Part-of-Speech Induction

Abstract:We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.

* Journal Of Artificial Intelligence Research, Volume 41, pages 527-551, 2011

Via

Access Paper or Ask Questions