Picture for Nick Craswell

Nick Craswell

Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents

Add code
Jun 24, 2026
Viaarxiv icon

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track

Add code
Mar 10, 2026
Viaarxiv icon

Beyond Output Critique: Self-Correction via Task Distillation

Add code
Jan 31, 2026
Viaarxiv icon

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

Add code
Jan 06, 2026
Viaarxiv icon

All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations

Add code
Oct 08, 2025
Figure 1 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 2 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 3 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Figure 4 for All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
Viaarxiv icon

Towards Understanding Bias in Synthetic Data for Evaluation

Add code
Jun 12, 2025
Viaarxiv icon

LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations

Add code
Apr 27, 2025
Viaarxiv icon

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Add code
Apr 21, 2025
Figure 1 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 2 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 3 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 4 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Viaarxiv icon

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Add code
Apr 21, 2025
Viaarxiv icon

Judging the Judges: A Collection of LLM-Generated Relevance Judgements

Add code
Feb 19, 2025
Figure 1 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 2 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 3 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Figure 4 for Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Viaarxiv icon