Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rahul Ponnusamy

Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

May 12, 2022

Manikandan Ravikiran, Bharathi Raja Chakravarthi, Anand Kumar Madasamy, Sangeetha Sivanesan, Ratnavel Rajalakshmi, Sajeetha Thavareesan, Rahul Ponnusamy, Shankar Mahadevan. /

Figure 1 for Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

Figure 2 for Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

Figure 3 for Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

Figure 4 for Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

Abstract:Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in codemixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems

* System Description of Shared Task https://competitions.codalab.org/competitions/36395

Via

Access Paper or Ask Questions

TamilEmo: Finegrained Emotion Detection Dataset for Tamil

Feb 09, 2022

Charangan Vasantharajan, Sean Benhur, Prasanna Kumar Kumarasen, Rahul Ponnusamy, Sathiyaraj Thangasamy, Ruba Priyadharshini, Thenmozhi Durairaj, Kanchana Sivanraju, Anbukkarasi Sampath, Bharathi Raja Chakravarthi(+1 more)

Figure 1 for TamilEmo: Finegrained Emotion Detection Dataset for Tamil

Figure 2 for TamilEmo: Finegrained Emotion Detection Dataset for Tamil

Figure 3 for TamilEmo: Finegrained Emotion Detection Dataset for Tamil

Figure 4 for TamilEmo: Finegrained Emotion Detection Dataset for Tamil

Abstract:Emotional Analysis from textual input has been considered both a challenging and interesting task in Natural Language Processing. However, due to the lack of datasets in low-resource languages (i.e. Tamil), it is difficult to conduct research of high standard in this area. Therefore we introduce this labelled dataset (a largest manually annotated dataset of more than 42k Tamil YouTube comments, labelled for 31 emotions including neutral) for emotion recognition. The goal of this dataset is to improve emotion detection in multiple downstream tasks in Tamil. We have also created three different groupings of our emotions (3-class, 7-class and 31-class) and evaluated the model's performance on each category of the grouping. Our MURIL-base model has achieved a 0.60 macro average F1-score across our 3-class group dataset. With 7-class and 31-class groups, the Random Forest model performed well with a macro average F1-scores of 0.42 and 0.29 respectively.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Nov 18, 2021

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee(+1 more)

Figure 1 for Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Figure 2 for Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Figure 3 for Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Figure 4 for Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Abstract:We present the results of the Dravidian-CodeMix shared task held at FIRE 2021, a track on sentiment analysis for Dravidian Languages in Code-Mixed Text. We describe the task, its organization, and the submitted systems. This shared task is the continuation of last year's Dravidian-CodeMix shared task held at FIRE 2020. This year's tasks included code-mixing at the intra-token and inter-token levels. Additionally, apart from Tamil and Malayalam, Kannada was also introduced. We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English. The top system for Tamil-English, Malayalam-English and Kannada-English scored weighted average F1-score of 0.711, 0.804, and 0.630, respectively. In summary, the quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs more improvement.

Via

Access Paper or Ask Questions

Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Nov 05, 2021

Bharathi Raja Chakravarthi, Dhivya Chinnappa, Ruba Priyadharshini, Anand Kumar Madasamy, Sangeetha Sivanesan, Subalalitha Chinnaudayar Navaneethakrishnan, Sajeetha Thavareesan, Dhanalakshmi Vadivel, Rahul Ponnusamy, Prasanna Kumar Kumaresan

Figure 1 for Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Figure 2 for Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Figure 3 for Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Figure 4 for Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Abstract:With the fast growth of mobile computing and Web technologies, offensive language has become more prevalent on social networking platforms. Since offensive language identification in local languages is essential to moderate the social media content, in this paper we work with three Dravidian languages, namely Malayalam, Tamil, and Kannada, that are under-resourced. We present an evaluation task at FIRE 2020- HASOC-DravidianCodeMix and DravidianLangTech at EACL 2021, designed to provide a framework for comparing different approaches to this problem. This paper describes the data creation, defines the task, lists the participating systems, and discusses various methods.

* 23

Via

Access Paper or Ask Questions

Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Sep 01, 2021

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj Thangasamy, Rajendran Nallathambi, John Phillip McCrae

Figure 1 for Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Figure 2 for Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Figure 3 for Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Figure 4 for Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Abstract:The increased proliferation of abusive content on social media platforms has a negative impact on online users. The dread, dislike, discomfort, or mistrust of lesbian, gay, transgender or bisexual persons is defined as homophobia/transphobia. Homophobic/transphobic speech is a type of offensive language that may be summarized as hate speech directed toward LGBT+ people, and it has been a growing concern in recent years. Online homophobia/transphobia is a severe societal problem that can make online platforms poisonous and unwelcome to LGBT+ people while also attempting to eliminate equality, diversity, and inclusion. We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified. We educated annotators and supplied them with comprehensive annotation rules because this is a sensitive issue, and we previously discovered that untrained crowdsourcing annotators struggle with diagnosing homophobia due to cultural and other prejudices. The dataset comprises 15,141 annotated multilingual comments. This paper describes the process of building the dataset, qualitative analysis of data, and inter-annotator agreement. In addition, we create baseline models for the dataset. To the best of our knowledge, our dataset is the first such dataset created. Warning: This paper contains explicit statements of homophobia, transphobia, stereotypes which may be distressing to some readers.

* 44 Pages

Via

Access Paper or Ask Questions

DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Jun 09, 2021

Bharathi Raja Chakravarthi, Jishnu Parameswaran P. K, Premjith B, K. P Soman, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kingston Pal Thamburaj, John P. McCrae

Figure 1 for DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Figure 2 for DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Figure 3 for DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Figure 4 for DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Abstract:Human communication is inherently multimodal and asynchronous. Analyzing human emotions and sentiment is an emerging field of artificial intelligence. We are witnessing an increasing amount of multimodal content in local languages on social media about products and other topics. However, there are not many multimodal resources available for under-resourced Dravidian languages. Our study aims to create a multimodal sentiment analysis dataset for the under-resourced Tamil and Malayalam languages. First, we downloaded product or movies review videos from YouTube for Tamil and Malayalam. Next, we created captions for the videos with the help of annotators. Then we labelled the videos for sentiment, and verified the inter-annotator agreement using Fleiss's Kappa. This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.

* 31

Via

Access Paper or Ask Questions