Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Nov 15, 2024

Libo Wang

Figure 1 for Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Figure 2 for Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Figure 3 for Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Figure 4 for Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Share this with someone who'll enjoy it:

Abstract:To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena. Notably, the data set, experimental process, code and data results have been uploaded to Github, the link is https://github.com/brucewang123456789/GeniusTrail.git.

* This research is also submitted to OpenReview. The main text is 9 pages (excluding citations), 7 figures, and 1 table

View paper on

Share this with someone who'll enjoy it:

Title:Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Paper and Code