Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Apr 01, 2024

ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo

Figure 1 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 2 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 3 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 4 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Share this with someone who'll enjoy it:

Abstract:Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by considering their relevance to previous dialogue histories. Although effective, these metrics evaluate individual responses directly rather than considering their relative quality compared to other responses. To handle this, we propose PairEval, a novel dialogue evaluation metric for assessing responses by comparing their quality against responses in different conversations. PairEval is built on top of open-sourced and moderate-size language models, and we make them specialized in pairwise comparison between dialogue responses. Extensive experiments on multiple benchmarks demonstrate that our metric exhibits a higher correlation with human judgments than baseline metrics. We also find that the proposed comparative metric is more robust in detecting common failures from open-domain dialogue systems, including repetition and speaker insensitivity.

View paper on

Share this with someone who'll enjoy it:

Title:PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Paper and Code