Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

May 21, 2023

Jessica López Espejel, El Hassane Ettifouri, Mahaman Sanoussi Yahaya Alassan, El Mehdi Chouham, Walid Dahhane

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have exhibited remarkable performance on various Natural Language Processing (NLP) tasks. However, there is a current hot debate regarding their reasoning capacity. In this paper, we examine the performance of GPT-3.5 and GPT-4 models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets. Our findings show that GPT-4 outperforms GPT-3.5 in zero-shot learning throughout almost all evaluated tasks. In addition, we note that both models exhibit limited performance in Inductive, Mathematical, and Multi-hop Reasoning Tasks. While it may seem intuitive that the GPT-4 model would outperform GPT-3.5 given its size and efficiency in various NLP tasks, our paper offers empirical evidence to support this claim. We provide a detailed and comprehensive analysis of the results from both models to further support our findings. In addition, we propose a set of engineered prompts that improves performance of both models on zero-shot learning.

* Under Review at Elsevier's Natural Language Processing Journal

View paper on

Share this with someone who'll enjoy it:

Title:GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

Paper and Code