Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Debdoot Mukherjee

Multilingual and Multimodal Abuse Detection

Apr 03, 2022

Rini Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta

Figure 1 for Multilingual and Multimodal Abuse Detection

Figure 2 for Multilingual and Multimodal Abuse Detection

Figure 3 for Multilingual and Multimodal Abuse Detection

Figure 4 for Multilingual and Multimodal Abuse Detection

Abstract:The presence of abusive content on social media platforms is undesirable as it severely impedes healthy and safe social media interactions. While automatic abuse detection has been widely explored in textual domain, audio abuse detection still remains unexplored. In this paper, we attempt abuse detection in conversational audio from a multimodal perspective in a multilingual social media setting. Our key hypothesis is that along with the modelling of audio, incorporating discriminative information from other modalities can be highly beneficial for this task. Our proposed method, MADA, explicitly focuses on two modalities other than the audio itself, namely, the underlying emotions expressed in the abusive audio and the semantic information encapsulated in the corresponding textual form. Observations prove that MADA demonstrates gains over audio-only approaches on the ADIMA dataset. We test the proposed approach on 10 different languages and observe consistent gains in the range 0.6%-5.2% by leveraging multiple modalities. We also perform extensive ablation experiments for studying the contributions of every modality and observe the best results while leveraging all the modalities together. Additionally, we perform experiments to empirically confirm that there is a strong correlation between underlying emotions and abusive behaviour.

* Submitted to Interspeech 2022

Via

Access Paper or Ask Questions

3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Mar 28, 2022

Vikram Gupta, Trisha Mittal, Puneet Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Debdoot Mukherjee, Dinesh Manocha

Figure 1 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Figure 2 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Figure 3 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Figure 4 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Abstract:We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj. 3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages and captures popular short video trends like pranks, fails, romance, comedy expressed via unique audio-visual formats like self-shot videos, reaction videos, lip-synching, self-sung songs, etc. 3MASSIV presents an opportunity for multimodal and multilingual semantic understanding on these unique videos by annotating them for concepts, affective states, media types, and audio language. We present a thorough analysis of 3MASSIV and highlight the variety and unique aspects of our dataset compared to other contemporary popular datasets with strong baselines. We also show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.

* Accepted in CVPR 2022

Via

Access Paper or Ask Questions

ADIMA: Abuse Detection In Multilingual Audio

Feb 16, 2022

Vikram Gupta, Rini Sharon, Ramit Sawhney, Debdoot Mukherjee

Figure 1 for ADIMA: Abuse Detection In Multilingual Audio

Figure 2 for ADIMA: Abuse Detection In Multilingual Audio

Figure 3 for ADIMA: Abuse Detection In Multilingual Audio

Figure 4 for ADIMA: Abuse Detection In Multilingual Audio

Abstract:Abusive content detection in spoken text can be addressed by performing Automatic Speech Recognition (ASR) and leveraging advancements in natural language processing. However, ASR models introduce latency and often perform sub-optimally for profane words as they are underrepresented in training corpora and not spoken clearly or completely. Exploration of this problem entirely in the audio domain has largely been limited by the lack of audio datasets. Building on these challenges, we propose ADIMA, a novel, linguistically diverse, ethically sourced, expert annotated and well-balanced multilingual profanity detection audio dataset comprising of 11,775 audio samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users. Through quantitative experiments across monolingual and cross-lingual zero-shot settings, we take the first step in democratizing audio based content moderation in Indic languages and set forth our dataset to pave future work.

Via

Access Paper or Ask Questions

Understanding Chat Messages for Sticker Recommendation in Hike Messenger

Feb 07, 2019

Abhishek Laddha, Mohamed Hanoosh, Debdoot Mukherjee

Figure 1 for Understanding Chat Messages for Sticker Recommendation in Hike Messenger

Figure 2 for Understanding Chat Messages for Sticker Recommendation in Hike Messenger

Figure 3 for Understanding Chat Messages for Sticker Recommendation in Hike Messenger

Figure 4 for Understanding Chat Messages for Sticker Recommendation in Hike Messenger

Abstract:Stickers are popularly used in messaging apps such as Hike to visually express a nuanced range of thoughts and utterances and convey exaggerated emotions. However, discovering the right sticker at the right time in a chat from a large and ever expanding pool of stickers can be cumbersome. In this paper, we describe a system for recommending stickers as users chat based on what the user is typing and the conversational context. We decompose the sticker recommendation problem into two steps. First, we predict the next message that the user is likely to send in the chat. Second, we substitute the predicted message with an appropriate sticker. Majority of Hike's users transliterate messages from their native language to English. This leads to numerous orthographic variations of the same message and thus complicates message prediction. To address this issue, we cluster the messages that have the same meaning and predict the message cluster instead of the message. We experiment with different approaches to train embedding for chat messages and study their efficacy in learning similar dense representations for messages that have the same intent. We propose a novel hybrid message prediction model, which can run with low latency on low end phones that have severe computational limitations.

* 9 pages, Under submission in KDD Applied Data Science Track

Via

Access Paper or Ask Questions

Heterogeneous Edge Embeddings for Friend Recommendation

Feb 07, 2019

Janu Verma, Srishti Gupta, Debdoot Mukherjee, Tanmoy Chakraborty

Figure 1 for Heterogeneous Edge Embeddings for Friend Recommendation

Figure 2 for Heterogeneous Edge Embeddings for Friend Recommendation

Abstract:We propose a friend recommendation system (an application of link prediction) using edge embeddings on social networks. Most real-world social networks are multi-graphs, where different kinds of relationships (e.g. chat, friendship) are possible between a pair of users. Existing network embedding techniques do not leverage signals from different edge types and thus perform inadequately on link prediction in such networks. We propose a method to mine network representation that effectively exploits heterogeneity in multi-graphs. We evaluate our model on a real-world, active social network where this system is deployed for friend recommendation for millions of users. Our method outperforms various state-of-the-art baselines on Hike's social network in terms of accuracy as well as user satisfaction.

* To appear in ECIR, 2019

Via

Access Paper or Ask Questions