Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

May 05, 2024

Tobias Groot, Matias Valdenegro-Toro

Share this with someone who'll enjoy it:

Abstract:Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object counting, and the Net Calibration Error (NCE) to measure direction of miscalibration. Results show that both LLMs and VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. Additionally we develop prompts for regression tasks, and we show that VLMs have poor calibration when producing mean/standard deviation and 95% confidence intervals.

* 8 pages, with appendix. To appear in TrustNLP workshop @ NAACL 2024

View paper on

Share this with someone who'll enjoy it:

Title:Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Paper and Code