Picture for Nick Craswell

Nick Craswell

Beyond Output Critique: Self-Correction via Task Distillation

Add code
Jan 31, 2026
Viaarxiv icon

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

Add code
Jan 06, 2026
Viaarxiv icon

All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations

Add code
Oct 08, 2025
Figure 1 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 2 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 3 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 4 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Viaarxiv icon

Towards Understanding Bias in Synthetic Data for Evaluation

Add code
Jun 12, 2025
Viaarxiv icon

LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations

Add code
Apr 27, 2025
Viaarxiv icon

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Add code
Apr 21, 2025
Viaarxiv icon

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Add code
Apr 21, 2025
Figure 1 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 2 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 3 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 4 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Viaarxiv icon

Judging the Judges: A Collection of LLM-Generated Relevance Judgements

Add code
Feb 19, 2025
Figure 1 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 2 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 3 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 4 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Viaarxiv icon

JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

Add code
Dec 17, 2024
Figure 1 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 2 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 3 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Figure 4 for JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
Viaarxiv icon

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

Add code
Nov 14, 2024
Figure 1 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 2 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 3 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 4 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Viaarxiv icon