Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Oct 20, 2023

Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li(+4 more)

Figure 1 for Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Figure 2 for Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Figure 3 for Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Figure 4 for Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Share this with someone who'll enjoy it:

Abstract:The rapid advancement of large language models (LLMs) presents both opportunities and challenges, particularly concerning unintentional generation of harmful and toxic responses. While the traditional alignment methods strive to steer LLMs towards desired performance and shield them from malicious content, this study proposes a novel alignment strategy rooted in mistake analysis by exposing LLMs to flawed outputs purposefully and then conducting a thorough assessment to fully comprehend internal reasons via natural language analysis. Thus, toxic responses can be transformed into instruction tuning corpus for model alignment, and LLMs can not only be deterred from generating flawed responses but also trained to self-criticize, leveraging its innate ability to discriminate toxic content. Experimental results demonstrate that the proposed method outperforms conventional alignment techniques for safety instruction following, while maintaining superior efficiency.

View paper on

Share this with someone who'll enjoy it:

Title:Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Paper and Code