Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

Sep 15, 2024

Judy Hanwen Shen, Archit Sharma, Jun Qin

Share this with someone who'll enjoy it:

Abstract:The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). While new preference datasets are being introduced with increasing frequency, there are currently no existing efforts to measure and compare these datasets. In this paper, we systematically study preference datasets through three perspectives: scale, label noise, and information content. We propose specific metrics for each of these perspectives and uncover different axes of comparison for a better understanding of preference datasets. Our work is a first step towards a data-centric approach to alignment by providing perspectives that aid in training efficiency and iterative data collection for RLHF.

* Working Paper

View paper on

Share this with someone who'll enjoy it:

Title:Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

Paper and Code