Picture for Daniel Deutsch

Daniel Deutsch

WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects

Add code
Feb 18, 2025
Viaarxiv icon

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation

Add code
Jan 30, 2025
Viaarxiv icon

Mitigating Metric Bias in Minimum Bayes Risk Decoding

Add code
Nov 05, 2024
Figure 1 for Mitigating Metric Bias in Minimum Bayes Risk Decoding
Figure 2 for Mitigating Metric Bias in Minimum Bayes Risk Decoding
Figure 3 for Mitigating Metric Bias in Minimum Bayes Risk Decoding
Figure 4 for Mitigating Metric Bias in Minimum Bayes Risk Decoding
Viaarxiv icon

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

Add code
Oct 14, 2024
Viaarxiv icon

MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task

Add code
Oct 04, 2024
Viaarxiv icon

Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy

Add code
Sep 15, 2024
Figure 1 for Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Figure 2 for Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Figure 3 for Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Figure 4 for Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Viaarxiv icon

On the Role of Summary Content Units in Text Summarization Evaluation

Add code
Apr 02, 2024
Figure 1 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 2 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 3 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 4 for On the Role of Summary Content Units in Text Summarization Evaluation
Viaarxiv icon

Finding Replicable Human Evaluations via Stable Ranking Probability

Add code
Apr 01, 2024
Viaarxiv icon

Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback

Add code
Nov 15, 2023
Figure 1 for Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
Figure 2 for Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
Figure 3 for Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
Figure 4 for Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
Viaarxiv icon

There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

Add code
Nov 09, 2023
Viaarxiv icon