Picture for Pius von Däniken

Pius von Däniken

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

Add code
Jun 03, 2024
Viaarxiv icon

Correction of Errors in Preference Ratings from Automated Metrics for Text Generation

Add code
Jun 06, 2023
Viaarxiv icon

On the Effectiveness of Automated Metrics for Text Generation Systems

Add code
Oct 24, 2022
Viaarxiv icon

Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

Add code
Feb 28, 2022
Figure 1 for Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Figure 2 for Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Figure 3 for Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Figure 4 for Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Viaarxiv icon

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Add code
Oct 05, 2020
Figure 1 for Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Figure 2 for Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Figure 3 for Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Figure 4 for Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Viaarxiv icon