Picture for Nick Craswell

Nick Craswell

Towards Understanding Bias in Synthetic Data for Evaluation

Add code
Jun 12, 2025
Viaarxiv icon

LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations

Add code
Apr 27, 2025
Viaarxiv icon

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Add code
Apr 21, 2025
Viaarxiv icon

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Add code
Apr 21, 2025
Viaarxiv icon

Judging the Judges: A Collection of LLM-Generated Relevance Judgements

Add code
Feb 19, 2025
Viaarxiv icon

JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

Add code
Dec 17, 2024
Figure 1 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 2 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 3 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 4 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Viaarxiv icon

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

Add code
Nov 14, 2024
Figure 1 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 2 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 3 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 4 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Viaarxiv icon

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

Add code
Nov 13, 2024
Viaarxiv icon

SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

Add code
Aug 30, 2024
Figure 1 for SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Figure 2 for SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Figure 3 for SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Figure 4 for SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Viaarxiv icon

LLMJudge: LLMs for Relevance Judgments

Add code
Aug 09, 2024
Figure 1 for LLMJudge: LLMs for Relevance Judgments
Figure 2 for LLMJudge: LLMs for Relevance Judgments
Viaarxiv icon