Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Word Clustering and Disambiguation Based on Co-occurrence Data

Jul 17, 1998

Hang Li, Naoki Abe

Figure 1 for Word Clustering and Disambiguation Based on Co-occurrence Data

Figure 2 for Word Clustering and Disambiguation Based on Co-occurrence Data

Figure 3 for Word Clustering and Disambiguation Based on Co-occurrence Data

Figure 4 for Word Clustering and Disambiguation Based on Co-occurrence Data

Share this with someone who'll enjoy it:

Abstract:We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability distribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability distribution. Our method is a natural extension of those proposed in (Brown et al 92) and (Li & Abe 96), and overcomes their drawbacks while retaining their advantages. We then combined this clustering method with the disambiguation method of (Li & Abe 95) to derive a disambiguation method that makes use of both automatically constructed thesauruses and a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 85.2%, which compares favorably against the accuracy (82.4%) obtained by the state-of-the-art disambiguation method of (Brill & Resnik 94).

* latex file, uses colacl.sty file and 4 eps files, to appear in Proc. of COLING-ACL'98, 8 pages

View paper on

Share this with someone who'll enjoy it:

Title:Word Clustering and Disambiguation Based on Co-occurrence Data

Paper and Code