Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Class-RAG: Content Moderation with Retrieval Augmented Generation

Oct 18, 2024

Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri, Ming Jiang, Ji Qi(+3 more)

Figure 1 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 2 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 3 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 4 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Share this with someone who'll enjoy it:

Abstract:Robust content moderation classifiers are essential for the safety of Generative AI systems. Content moderation, or safety classification, is notoriously ambiguous: differences between safe and unsafe inputs are often extremely subtle, making it difficult for classifiers (and indeed, even humans) to properly distinguish violating vs. benign samples without further context or explanation. Furthermore, as these technologies are deployed across various applications and audiences, scaling risk discovery and mitigation through continuous model fine-tuning becomes increasingly challenging and costly. To address these challenges, we propose a Classification approach employing Retrieval-Augmented Generation (Class-RAG). Class-RAG extends the capability of its base LLM through access to a retrieval library which can be dynamically updated to enable semantic hotfixing for immediate, flexible risk mitigation. Compared to traditional fine-tuned models, Class-RAG demonstrates flexibility and transparency in decision-making. As evidenced by empirical studies, Class-RAG outperforms on classification and is more robust against adversarial attack. Besides, our findings suggest that Class-RAG performance scales with retrieval library size, indicating that increasing the library size is a viable and low-cost approach to improve content moderation.

* 11 pages, submit to ACL

View paper on

Share this with someone who'll enjoy it:

Title:Class-RAG: Content Moderation with Retrieval Augmented Generation

Paper and Code