Picture for Elizabeth Clark

Elizabeth Clark

Evaluating LLMs for Targeted Concept Simplification forDomain-Specific Texts

Add code
Oct 28, 2024
Viaarxiv icon

Agents' Room: Narrative Generation through Multi-step Collaboration

Add code
Oct 03, 2024
Viaarxiv icon

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Add code
May 22, 2023
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Add code
Dec 28, 2022
Viaarxiv icon

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Add code
Dec 20, 2022
Viaarxiv icon

Dialect-robust Evaluation of Generated Text

Add code
Nov 02, 2022
Figure 1 for Dialect-robust Evaluation of Generated Text
Figure 2 for Dialect-robust Evaluation of Generated Text
Figure 3 for Dialect-robust Evaluation of Generated Text
Figure 4 for Dialect-robust Evaluation of Generated Text
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Add code
Feb 14, 2022
Viaarxiv icon

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Add code
Jul 07, 2021
Figure 1 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 2 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 3 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 4 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Viaarxiv icon