Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Sep 19, 2023

Huachuan Qiu, Shuai Zhang, Hongliang He, Anqi Li, Zhenzhong Lan

Figure 1 for Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Figure 2 for Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Figure 3 for Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Figure 4 for Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Share this with someone who'll enjoy it:

Abstract:NSFW (Not Safe for Work) content, in the context of a dialogue, can have severe side effects on users in open-domain dialogue systems. However, research on detecting NSFW language, especially sexually explicit content, within a dialogue context has significantly lagged behind. To address this issue, we introduce CensorChat, a dialogue monitoring dataset aimed at NSFW dialogue detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, this dataset offers a cost-effective means of constructing NSFW content detectors. The process entails collecting real-life human-machine interaction data and breaking it down into single utterances and single-turn dialogues, with the chatbot delivering the final utterance. ChatGPT is employed to annotate unlabeled data, serving as a training set. Rationale validation and test sets are constructed using ChatGPT and GPT-4 as annotators, with a self-criticism strategy for resolving discrepancies in labeling. A BERT model is fine-tuned as a text classifier on pseudo-labeled data, and its performance is assessed. The study emphasizes the importance of AI systems prioritizing user safety and well-being in digital conversations while respecting freedom of expression. The proposed approach not only advances NSFW content detection but also aligns with evolving user protection needs in AI-driven dialogues.

* Submitted to ICASSP 2024. Code and data are publicly available at https://github.com/qiuhuachuan/CensorChat

View paper on

Share this with someone who'll enjoy it:

Title:Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Paper and Code