Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patricia Robinson

Automating Coreference: The Role of Annotated Training Data

Mar 02, 1998

Lynette Hirschman, Patricia Robinson, John Burger, Marc Vilain

Figure 1 for Automating Coreference: The Role of Annotated Training Data

Figure 2 for Automating Coreference: The Role of Annotated Training Data

Figure 3 for Automating Coreference: The Role of Annotated Training Data

Figure 4 for Automating Coreference: The Role of Annotated Training Data

Abstract:We report here on a study of interannotator agreement in the coreference task as defined by the Message Understanding Conference (MUC-6 and MUC-7). Based on feedback from annotators, we clarified and simplified the annotation specification. We then performed an analysis of disagreement among several annotators, concluding that only 16% of the disagreements represented genuine disagreement about coreference; the remainder of the cases were mostly typographical errors or omissions, easily reconciled. Initially, we measured interannotator agreement in the low 80s for precision and recall. To try to improve upon this, we ran several experiments. In our final experiment, we separated the tagging of candidate noun phrases from the linking of actual coreferring expressions. This method shows promise - interannotator agreement climbed to the low 90s - but it needs more extensive validation. These results position the research community to broaden the coreference task to multiple languages, and possibly to different kinds of coreference.

* 4 pages, 5 figures. To appear in the AAAI Spring Symposium on Applying Machine Learning to Discourse Processing. The Alembic Workbench annotation tool described in this paper is available at http://www.mitre.org/resources/centers/advanced_info/g04h/workbench.html

Via

Access Paper or Ask Questions