Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaisla Kajava

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Nov 06, 2020

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

Figure 1 for XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Figure 2 for XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Figure 3 for XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Figure 4 for XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Abstract:We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

* Accepted at COLING 2020

Via

Access Paper or Ask Questions

LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Aug 03, 2020

Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann

Figure 1 for LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Figure 2 for LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Figure 3 for LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Figure 4 for LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Abstract:This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

* Accepted at SemEval-2020 Task 12. Identical to camera-ready version except where adjustments to fit arXiv requirements were necessary

Via

Access Paper or Ask Questions