Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Direct Judgement Preference Optimization

Sep 23, 2024

Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty

Figure 1 for Direct Judgement Preference Optimization

Figure 2 for Direct Judgement Preference Optimization

Figure 3 for Direct Judgement Preference Optimization

Figure 4 for Direct Judgement Preference Optimization

Share this with someone who'll enjoy it:

Abstract:Auto-evaluation is crucial for assessing response quality and offering feedback for model development. Recent studies have explored training large language models (LLMs) as generative judges to evaluate and critique other models' outputs. In this work, we investigate the idea of learning from both positive and negative data with preference optimization to enhance the evaluation capabilities of LLM judges across an array of different use cases. We achieve this by employing three approaches to collect the preference pairs for different use cases, each aimed at improving our generative judge from a different perspective. Our comprehensive study over a wide range of benchmarks demonstrates the effectiveness of our method. In particular, our generative judge achieves the best performance on 10 out of 13 benchmarks, outperforming strong baselines like GPT-4o and specialized judge models. Further analysis show that our judge model robustly counters inherent biases such as position and length bias, flexibly adapts to any evaluation protocol specified by practitioners, and provides helpful language feedback for improving downstream generator models.

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:Direct Judgement Preference Optimization

Paper and Code