Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Apr 28, 2024

Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Tianyu Shi

Figure 1 for RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Figure 2 for RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Figure 3 for RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Figure 4 for RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Share this with someone who'll enjoy it:

Abstract:Biases and stereotypes in Large Language Models (LLMs) can have negative implications for user experience and societal outcomes. Current approaches to bias mitigation like Reinforcement Learning from Human Feedback (RLHF) rely on costly manual feedback. While LLMs have the capability to understand logic and identify biases in text, they often struggle to effectively acknowledge and address their own biases due to factors such as prompt influences, internal mechanisms, and policies. We found that informing LLMs that the content they generate is not their own and questioning them about potential biases in the text can significantly enhance their recognition and improvement capabilities regarding biases. Based on this finding, we propose RLRF (Reinforcement Learning from Reflection through Debates as Feedback), replacing human feedback with AI for bias mitigation. RLRF engages LLMs in multi-role debates to expose biases and gradually reduce biases in each iteration using a ranking scoring mechanism. The dialogue are then used to create a dataset with high-bias and low-bias instances to train the reward model in reinforcement learning. This dataset can be generated by the same LLMs for self-reflection or a superior LLMs guiding the former in a student-teacher mode to enhance its logical reasoning abilities. Experimental results demonstrate the significant effectiveness of our approach in bias reduction.

View paper on

Share this with someone who'll enjoy it:

Title:RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Paper and Code