Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neemesh Yadav

Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Mar 07, 2025

Neemesh Yadav, Jiarui Liu, Francesco Ortu, Roya Ensafi, Zhijing Jin, Rada Mihalcea

Abstract:The ability of Natural Language Processing (NLP) methods to categorize text into multiple classes has motivated their use in online content moderation tasks, such as hate speech and fake news detection. However, there is limited understanding of how or why these methods make such decisions, or why certain content is moderated in the first place. To investigate the hidden mechanisms behind content moderation, we explore multiple directions: 1) training classifiers to reverse-engineer content moderation decisions across countries; 2) explaining content moderation decisions by analyzing Shapley values and LLM-guided explanations. Our primary focus is on content moderation decisions made across countries, using pre-existing corpora sampled from the Twitter Stream Grab. Our experiments reveal interesting patterns in censored posts, both across countries and over time. Through human evaluations of LLM-generated explanations across three LLMs, we assess the effectiveness of using LLMs in content moderation. Finally, we discuss potential future directions, as well as the limitations and ethical considerations of this work. Our code and data are available at https://github.com/causalNLP/censorship

Via

Access Paper or Ask Questions

QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Dec 16, 2024

Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud, Md. Shad Akhtar

Figure 1 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 2 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 3 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 4 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Abstract:The rise of large language models (LLMs) has created a need for advanced benchmarking systems beyond traditional setups. To this end, we introduce QUENCH, a novel text-based English Quizzing Benchmark manually curated and transcribed from YouTube quiz videos. QUENCH possesses masked entities and rationales for the LLMs to predict via generation. At the intersection of geographical context and common sense reasoning, QUENCH helps assess world knowledge and deduction capabilities of LLMs via a zero-shot, open-domain quizzing setup. We perform an extensive evaluation on 7 LLMs and 4 metrics, investigating the influence of model size, prompting style, geographical context, and gold-labeled rationale generation. The benchmarking concludes with an error analysis to which the LLMs are prone.

* 17 Pages, 6 Figures, 8 Tables, COLING 2025

Via

Access Paper or Ask Questions

Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Jun 06, 2024

Neemesh Yadav, Sarah Masud, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

Figure 1 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 2 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 3 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 4 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Abstract:Employing language models to generate explanations for an incoming implicit hate post is an active area of research. The explanation is intended to make explicit the underlying stereotype and aid content moderators. The training often combines top-k relevant knowledge graph (KG) tuples to provide world knowledge and improve performance on standard metrics. Interestingly, our study presents conflicting evidence for the role of the quality of KG tuples in generating implicit explanations. Consequently, simpler models incorporating external toxicity signals outperform KG-infused models. Compared to the KG-based setup, we observe a comparable performance for SBIC (LatentHatred) datasets with a performance variation of +0.44 (+0.49), +1.83 (-1.56), and -4.59 (+0.77) in BLEU, ROUGE-L, and BERTScore. Further human evaluation and error analysis reveal that our proposed setup produces more precise explanations than zero-shot GPT-3.5, highlighting the intricate nature of the task.

* 17 Pages, 5 Figures, 13 Tables, ACL Findings 2024

Via

Access Paper or Ask Questions

The Art of Embedding Fusion: Optimizing Hate Speech Detection

Jun 26, 2023

Mohammad Aflah Khan, Neemesh Yadav, Mohit Jain, Sanyam Goyal

Abstract:Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, we shed light on various combination techniques for several PLMs and comprehensively analyze their effectiveness. Our findings show that combining embeddings leads to slight improvements but at a high computational cost and the choice of combination has marginal effect on the final outcome. We also make our codebase public at https://github.com/aflah02/The-Art-of-Embedding-Fusion-Optimizing-Hate-Speech-Detection .

Via

Access Paper or Ask Questions