Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Sep 27, 2023

Hosein Hasanbeig, Hiteshi Sharma, Leo Betthauser, Felipe Vieira Frujeri, Ida Momennejad

Figure 1 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Figure 2 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Figure 3 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Figure 4 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Share this with someone who'll enjoy it:

Abstract:From grading papers to summarizing medical documents, large language models (LLMs) are evermore used for evaluation of text generated by humans and AI alike. However, despite their extensive utility, LLMs exhibit distinct failure modes, necessitating a thorough audit and improvement of their text evaluation capabilities. Here we introduce ALLURE, a systematic approach to Auditing Large Language Models Understanding and Reasoning Errors. ALLURE involves comparing LLM-generated evaluations with annotated data, and iteratively incorporating instances of significant deviation into the evaluator, which leverages in-context learning (ICL) to enhance and improve robust evaluation of text by LLMs. Through this iterative process, we refine the performance of the evaluator LLM, ultimately reducing reliance on human annotators in the evaluation process. We anticipate ALLURE to serve diverse applications of LLMs in various domains related to evaluation of textual data, such as medical summarization, education, and and productivity.

View paper on

Share this with someone who'll enjoy it:

Title:ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Paper and Code