Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Direct Preference Optimization With Unobserved Preference Heterogeneity

May 23, 2024

Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis

Figure 1 for Direct Preference Optimization With Unobserved Preference Heterogeneity

Figure 2 for Direct Preference Optimization With Unobserved Preference Heterogeneity

Figure 3 for Direct Preference Optimization With Unobserved Preference Heterogeneity

Figure 4 for Direct Preference Optimization With Unobserved Preference Heterogeneity

Share this with someone who'll enjoy it:

Abstract:RLHF has emerged as a pivotal step in aligning language models with human objectives and values. It typically involves learning a reward model from human preference data and then using reinforcement learning to update the generative model accordingly. Conversely, Direct Preference Optimization (DPO) directly optimizes the generative model with preference data, skipping reinforcement learning. However, both RLHF and DPO assume uniform preferences, overlooking the reality of diverse human annotators. This paper presents a new method to align generative models with varied human preferences. We propose an Expectation-Maximization adaptation to DPO, generating a mixture of models based on latent preference types of the annotators. We then introduce a min-max regret ensemble learning model to produce a single generative method to minimize worst-case regret among annotator subgroups with similar latent factors. Our algorithms leverage the simplicity of DPO while accommodating diverse preferences. Experimental results validate the effectiveness of our approach in producing equitable generative policies.

View paper on

Share this with someone who'll enjoy it:

Title:Direct Preference Optimization With Unobserved Preference Heterogeneity

Paper and Code