Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Add code
Oct 03, 2024

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: