Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eleni Zimianiti

Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Jun 11, 2018

Antonis Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, David Chiang

Figure 1 for Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Figure 2 for Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Figure 3 for Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Figure 4 for Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Abstract:Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language, Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supervised method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set, we achieve improvements of more than 21 percentage points.

* to be presented at COLING 2018

Via

Access Paper or Ask Questions