Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring Precision and Recall to assess the quality and diversity of LLMs

Feb 28, 2024

Florian Le Bronnec, Alexandre Verine, Benjamin Negrevergne, Yann Chevaleyre, Alexandre Allauzen

Figure 1 for Exploring Precision and Recall to assess the quality and diversity of LLMs

Figure 2 for Exploring Precision and Recall to assess the quality and diversity of LLMs

Figure 3 for Exploring Precision and Recall to assess the quality and diversity of LLMs

Figure 4 for Exploring Precision and Recall to assess the quality and diversity of LLMs

Share this with someone who'll enjoy it:

Abstract:This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and Mistral, focusing on the adaptation of Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals significant insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges faced by current LLMs in generating diverse and high-quality text.

* 21 pages, 15 figures, Under Review

View paper on

Share this with someone who'll enjoy it:

Title:Exploring Precision and Recall to assess the quality and diversity of LLMs

Paper and Code