Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tinh Son Luong

ToVo: Toxicity Taxonomy via Voting

Jun 21, 2024

Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Diep Thi-Ngoc Nguyen

Figure 1 for ToVo: Toxicity Taxonomy via Voting

Figure 2 for ToVo: Toxicity Taxonomy via Voting

Figure 3 for ToVo: Toxicity Taxonomy via Voting

Figure 4 for ToVo: Toxicity Taxonomy via Voting

Abstract:Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a high-quality open-source dataset for toxic content detection. Our methodology ensures diverse classification metrics for each sample and includes both classification scores and explanatory reasoning for the classifications. We utilize the dataset created through our proposed mechanism to train our model, which is then compared against existing widely-used detectors. Our approach not only enhances transparency and customizability but also facilitates better fine-tuning for specific use cases. This work contributes a robust framework for developing toxic content detection models, emphasizing openness and adaptability, thus paving the way for more effective and user-specific content moderation solutions.

Via

Access Paper or Ask Questions

Realistic Evaluation of Toxicity in Large Language Models

May 17, 2024

Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen

Figure 1 for Realistic Evaluation of Toxicity in Large Language Models

Figure 2 for Realistic Evaluation of Toxicity in Large Language Models

Figure 3 for Realistic Evaluation of Toxicity in Large Language Models

Figure 4 for Realistic Evaluation of Toxicity in Large Language Models

Abstract:Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.

Via

Access Paper or Ask Questions