Picture for Saad Mahamood

Saad Mahamood

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Add code
Aug 17, 2024
Viaarxiv icon

On the Role of Summary Content Units in Text Summarization Evaluation

Add code
Apr 02, 2024
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Add code
Dec 28, 2022
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Add code
Dec 06, 2021
Figure 1 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 2 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 3 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 4 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Add code
Aug 08, 2021
Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Add code
Jun 16, 2021
Figure 1 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 2 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 3 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 4 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Viaarxiv icon

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Add code
Feb 03, 2021
Figure 1 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 2 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 3 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 4 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Viaarxiv icon