Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haeone Lee

Understanding Impact of Human Feedback via Influence Functions

Jan 10, 2025

Taywon Min, Haeone Lee, Hanho Ryu, Yongchan Kwon, Kimin Lee

Abstract:In Reinforcement Learning from Human Feedback (RLHF), it is crucial to learn suitable reward models from human feedback to align large language models (LLMs) with human intentions. However, human feedback can often be noisy, inconsistent, or biased, especially when evaluating complex responses. Such feedback can lead to misaligned reward signals, potentially causing unintended side effects during the RLHF process. To address these challenges, we explore the use of influence functions to measure the impact of human feedback on the performance of reward models. We propose a compute-efficient approximation method that enables the application of influence functions to LLM-based reward models and large-scale preference datasets. In our experiments, we demonstrate two key applications of influence functions: (1) detecting common forms of labeler bias in human feedback datasets and (2) guiding labelers to refine their strategies to align more closely with expert feedback. By quantifying the impact of human feedback on reward models, we believe that influence functions can enhance feedback interpretability and contribute to scalable oversight in RLHF, helping labelers provide more accurate and consistent feedback. Source code is available at https://github.com/mintaywon/IF_RLHF

* Source code: https://github.com/mintaywon/IF_RLHF

Via

Access Paper or Ask Questions

ComGAN: Toward GANs Exploiting Multiple Samples

Apr 24, 2023

Haeone Lee

Abstract:In this paper, we propose ComGAN(ComparativeGAN) which allows the generator in GANs to refer to the semantics of comparative samples(e.g. real data) by comparison. ComGAN generalizes relativistic GANs by using arbitrary architecture and mostly outperforms relativistic GANs in simple input-concatenation architecture. To train the discriminator in ComGAN, we also propose equality regularization, which fits the discriminator to a neutral label for equally real or fake samples. Equality regularization highly boosts the performance of ComGAN including WGAN while being exceptionally simple compared to existing regularizations. Finally, we generalize comparative samples fixed to real data in relativistic GANs toward fake data and show that such objectives are sound in both theory and practice. Our experiments demonstrate superior performances of ComGAN and equality regularization, achieving the best FIDs in 7 out of 8 cases of different losses and data against ordinary GANs and relativistic GANs.

* 22 pages

Via

Access Paper or Ask Questions