Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nolan Holley

Toward More Meaningful Resources for Lower-resourced Languages

Feb 24, 2022

Constantine Lignos, Nolan Holley, Chester Palen-Michel, Jonne Sälevä

Figure 1 for Toward More Meaningful Resources for Lower-resourced Languages

Figure 2 for Toward More Meaningful Resources for Lower-resourced Languages

Figure 3 for Toward More Meaningful Resources for Lower-resourced Languages

Figure 4 for Toward More Meaningful Resources for Lower-resourced Languages

Abstract:In this position paper, we describe our perspective on how meaningful resources for lower-resourced languages should be developed in connection with the speakers of those languages. We first examine two massively multilingual resources in detail. We explore the contents of the names stored in Wikidata for a few lower-resourced languages and find that many of them are not in fact in the languages they claim to be and require non-trivial effort to correct. We discuss quality issues present in WikiAnn and evaluate whether it is a useful supplement to hand annotated data. We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process. We conclude with recommended guidelines for resource development.

* Submitted to the ACL 2022 theme track "Language Diversity: from Low-Resource to Endangered Languages" and accepted to Findings of the ACL for ACL 2022

Via

Access Paper or Ask Questions

Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Jul 29, 2021

Chester Palen-Michel, Nolan Holley, Constantine Lignos

Figure 1 for Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Figure 2 for Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Figure 3 for Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Figure 4 for Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Abstract:To address what we believe is a looming crisis of unreproducible evaluation for named entity recognition tasks, we present guidelines for reproducible evaluation. The guidelines we propose are extremely simple, focusing on transparency regarding how chunks are encoded and scored, but very few papers currently being published fully comply with them. We demonstrate that despite the apparent simplicity of NER evaluation, unreported differences in the scoring procedure can result in changes to scores that are both of noticeable magnitude and are statistically significant. We provide SeqScore, an open source toolkit that addresses many of the issues that cause replication failures and makes following our guidelines easy.

* Under review

Via

Access Paper or Ask Questions