Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shreyas MS

Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Nov 12, 2021

Debapriya Tula, Shreyas MS, Viswanatha Reddy, Pranjal Sahu, Sumanth Doddapaneni, Prathyush Potluri, Rohan Sukumaran, Parth Patwa

Figure 1 for Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Figure 2 for Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Figure 3 for Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Figure 4 for Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Abstract:Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code mixed language and more. Moreover, even if we carefully sample and annotate offensive content, there will always exist significant class imbalance in offensive vs non offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use multilingual models that help transfer characteristics learnt across languages to work effectively with low resourced languages. It is also important to note that our model handles instances of mixed script (say usage of Latin and Dravidian - Tamil script) as well. Our model can handle offensive language detection in a low-resource, class imbalanced, multilingual and code mixed setting.

Via

Access Paper or Ask Questions