Picture for Zhumin Chu

Zhumin Chu

CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges

Add code
Oct 20, 2024
Viaarxiv icon

An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

Add code
Oct 16, 2024
Viaarxiv icon

PRE: A Peer Review Based Large Language Model Evaluator

Add code
Jan 28, 2024
Viaarxiv icon

Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

Add code
Oct 19, 2022
Figure 1 for Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks
Figure 2 for Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks
Figure 3 for Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks
Figure 4 for Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks
Viaarxiv icon

ConvSearch: A Open-Domain Conversational Search Behavior Dataset

Add code
Apr 06, 2022
Figure 1 for ConvSearch: A Open-Domain Conversational Search Behavior Dataset
Figure 2 for ConvSearch: A Open-Domain Conversational Search Behavior Dataset
Figure 3 for ConvSearch: A Open-Domain Conversational Search Behavior Dataset
Figure 4 for ConvSearch: A Open-Domain Conversational Search Behavior Dataset
Viaarxiv icon