Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aman Khullar

Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Oct 04, 2023

Aman Khullar, Daniel Nkemelu, Cuong V. Nguyen, Michael L. Best

Figure 1 for Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Figure 2 for Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Figure 3 for Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Figure 4 for Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Abstract:A growing body of work has focused on text classification methods for detecting the increasing amount of hate speech posted online. This progress has been limited to only a select number of highly-resourced languages causing detection systems to either under-perform or not exist in limited data contexts. This is majorly caused by a lack of training data which is expensive to collect and curate in these settings. In this work, we propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts using synthetic data generation techniques. Given a handful of hate speech examples in a high-resource language such as English, we present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. We apply our approach to generate training data for hate speech classification tasks in Hindi and Vietnamese. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain. This method can be adopted to bootstrap hate speech detection models from scratch in limited data contexts. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response to hate speech.

* Accepted at ACM Journal on Computing and Sustainable Societies

Via

Access Paper or Ask Questions

MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

Oct 15, 2020

Aman Khullar, Udit Arora

Figure 1 for MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

Figure 2 for MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

Figure 3 for MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

Figure 4 for MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

Abstract:This paper presents MAST, a new model for Multimodal Abstractive Text Summarization that utilizes information from all three modalities -- text, audio and video -- in a multimodal video. Prior work on multimodal abstractive text summarization only utilized information from the text and video modalities. We examine the usefulness and challenges of deriving information from the audio modality and present a sequence-to-sequence trimodal hierarchical attention-based model that overcomes these challenges by letting the model pay more attention to the text modality. MAST outperforms the current state of the art model (video-text) by 2.51 points in terms of Content F1 score and 1.00 points in terms of Rouge-L score on the How2 dataset for multimodal language understanding.

* To appear in the first EMNLP Workshop on NLP Beyond Text, 2020. Aman Khullar and Udit Arora have equal contribution

Via

Access Paper or Ask Questions