Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unbiased Evaluation of Large Language Models from a Causal Perspective

Feb 10, 2025

Meilin Chen, Jian Tian, Liang Ma, Di Xie, Weijie Chen, Jiang Zhu

Figure 1 for Unbiased Evaluation of Large Language Models from a Causal Perspective

Figure 2 for Unbiased Evaluation of Large Language Models from a Causal Perspective

Figure 3 for Unbiased Evaluation of Large Language Models from a Causal Perspective

Figure 4 for Unbiased Evaluation of Large Language Models from a Causal Perspective

Share this with someone who'll enjoy it:

Abstract:Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designing unbiased evaluation protocols. Furthermore, we identify two type of bias in Agents-as-an-Evaluator through carefully designed probing tasks on a minimal Agents-as-an-Evaluator setup. To address these issues, we propose the Unbiased Evaluator, an evaluation protocol that delivers a more comprehensive, unbiased, and interpretable assessment of LLMs.Extensive experiments reveal significant room for improvement in current LLMs. Additionally, we demonstrate that the Unbiased Evaluator not only offers strong evidence of benchmark contamination but also provides interpretable evaluation results.

View paper on

Share this with someone who'll enjoy it:

Title:Unbiased Evaluation of Large Language Models from a Causal Perspective

Paper and Code