Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Oct 12, 2022

Petter Mæhlum, Andre Kåsen, Samia Touileb, Jeremy Barnes

Figure 1 for Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Figure 2 for Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Share this with someone who'll enjoy it:

Abstract:Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokm{\aa}l and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset annotated with POS-tags. We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset, and that models trained on Bokm{\aa}l generally perform better than those trained on Nynorsk. We also see that performance on dialectal tweets is comparable to the written standards for some models. Finally we perform a detailed analysis of the errors that models commonly make on this data.

* Accepted at the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (Vardial2022). Collocated with COLING2022

View paper on

Share this with someone who'll enjoy it:

Title:Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Paper and Code