Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Suman Devkota

GPT-4's assessment of its performance in a USMLE-based case study

Feb 15, 2024

Uttam Dhakal, Aniket Kumar Singh, Suman Devkota, Yogesh Sapkota, Bishal Lamichhane, Suprinsa Paudyal, Chandra Dhakal

Figure 1 for GPT-4's assessment of its performance in a USMLE-based case study

Figure 2 for GPT-4's assessment of its performance in a USMLE-based case study

Figure 3 for GPT-4's assessment of its performance in a USMLE-based case study

Figure 4 for GPT-4's assessment of its performance in a USMLE-based case study

Abstract:This study investigates GPT-4's assessment of its performance in healthcare applications. A simple prompting technique was used to prompt the LLM with questions taken from the United States Medical Licensing Examination (USMLE) questionnaire and it was tasked to evaluate its confidence score before posing the question and after asking the question. The questionnaire was categorized into two groups-questions with feedback (WF) and questions with no feedback(NF) post-question. The model was asked to provide absolute and relative confidence scores before and after each question. The experimental findings were analyzed using statistical tools to study the variability of confidence in WF and NF groups. Additionally, a sequential analysis was conducted to observe the performance variation for the WF and NF groups. Results indicate that feedback influences relative confidence but doesn't consistently increase or decrease it. Understanding the performance of LLM is paramount in exploring its utility in sensitive areas like healthcare. This study contributes to the ongoing discourse on the reliability of AI, particularly of LLMs like GPT-4, within healthcare, offering insights into how feedback mechanisms might be optimized to enhance AI-assisted medical education and decision support.

Via

Access Paper or Ask Questions

The Confidence-Competence Gap in Large Language Models: A Cognitive Study

Sep 28, 2023

Aniket Kumar Singh, Suman Devkota, Bishal Lamichhane, Uttam Dhakal, Chandra Dhakal

Figure 1 for The Confidence-Competence Gap in Large Language Models: A Cognitive Study

Figure 2 for The Confidence-Competence Gap in Large Language Models: A Cognitive Study

Figure 3 for The Confidence-Competence Gap in Large Language Models: A Cognitive Study

Figure 4 for The Confidence-Competence Gap in Large Language Models: A Cognitive Study

Abstract:Large Language Models (LLMs) have acquired ubiquitous attention for their performances across diverse domains. Our study here searches through LLMs' cognitive abilities and confidence dynamics. We dive deep into understanding the alignment between their self-assessed confidence and actual performance. We exploit these models with diverse sets of questionnaires and real-world scenarios and extract how LLMs exhibit confidence in their responses. Our findings reveal intriguing instances where models demonstrate high confidence even when they answer incorrectly. This is reminiscent of the Dunning-Kruger effect observed in human psychology. In contrast, there are cases where models exhibit low confidence with correct answers revealing potential underestimation biases. Our results underscore the need for a deeper understanding of their cognitive processes. By examining the nuances of LLMs' self-assessment mechanism, this investigation provides noteworthy revelations that serve to advance the functionalities and broaden the potential applications of these formidable language models.

* 19 pages, 8 Figures, to be published in a journal (Journal TBD), All Authors contributed equally and were Supervised by Chandra Dhakal

Via

Access Paper or Ask Questions