Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Mar 07, 2023

Jiaan Wang, Yunlong Liang, Fandong Meng, Haoxiang Shi, Zhixu Li, Jinan Xu, Jianfeng Qu, Jie Zhou

Figure 1 for Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Figure 2 for Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Figure 3 for Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Figure 4 for Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Share this with someone who'll enjoy it:

Abstract:Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of NLG models is an arduous task and previous statistical metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to score the generation of NLG models. We conduct experiments on three widely-used NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with golden human judgments. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

* Technical Report, 8 pages

View paper on

Share this with someone who'll enjoy it:

Title:Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Paper and Code