Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cristina Bosco

O-Dang! The Ontology of Dangerous Speech Messages

Jul 13, 2022

Marco A. Stranisci, Simona Frenda, Mirko Lai, Oscar Araque, Alessandra T. Cignarella, Valerio Basile, Viviana Patti, Cristina Bosco

Figure 1 for O-Dang! The Ontology of Dangerous Speech Messages

Figure 2 for O-Dang! The Ontology of Dangerous Speech Messages

Figure 3 for O-Dang! The Ontology of Dangerous Speech Messages

Figure 4 for O-Dang! The Ontology of Dangerous Speech Messages

Abstract:Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of "gold standard", which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG. The paper is structured as follows. In Section 1 the motivations of our work are outlined. Section 2 describes the O-Dang! Ontology, that provides a common semantic model for the integration of datasets in the KG. The Ontology Population stage with information about corpora, users, and annotations is presented in Section 3. Finally, in Section 4 an analysis of offensiveness across corpora is provided as a first case study for the resource.

Via

Access Paper or Ask Questions

Multilingual Irony Detection with Dependency Syntax and Neural Models

Nov 11, 2020

Alessandra Teresa Cignarella, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, Paolo Rosso, Farah Benamara

Figure 1 for Multilingual Irony Detection with Dependency Syntax and Neural Models

Figure 2 for Multilingual Irony Detection with Dependency Syntax and Neural Models

Figure 3 for Multilingual Irony Detection with Dependency Syntax and Neural Models

Figure 4 for Multilingual Irony Detection with Dependency Syntax and Neural Models

Abstract:This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

* long paper accepted at COLING 2020

Via

Access Paper or Ask Questions

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

Nov 03, 2020

Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes

Figure 1 for Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

Figure 2 for Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

Figure 3 for Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

Figure 4 for Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

Abstract:This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks -- based on available literature -- along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.

Via

Access Paper or Ask Questions