Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luke Yeh

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Oct 13, 2022

Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor

Figure 1 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 2 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 3 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 4 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Abstract:Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (>90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.

* preprint (19 pages), code in https://github.com/gnkitaa/ezCoref

Via

Access Paper or Ask Questions

Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

Oct 19, 2019

Jack Merullo, Luke Yeh, Abram Handler, Alvin Grissom II, Brendan O'Connor, Mohit Iyyer

Figure 1 for Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

Figure 2 for Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

Figure 3 for Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

Figure 4 for Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

Abstract:Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across six decades that are automatically annotated with 250K player mentions and linked with racial metadata. We identify major confounding factors for researchers examining racial bias in FOOTBALL, and perform a computational analysis that supports conclusions from prior social science studies.

Via

Access Paper or Ask Questions