Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Dec 19, 2024

Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong

Figure 1 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Figure 2 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Figure 3 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Figure 4 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Share this with someone who'll enjoy it:

Abstract:Multimodal large language models (MLLMs) combine visual and textual data for tasks such as image captioning and visual question answering. Proper uncertainty calibration is crucial, yet challenging, for reliable use in areas like healthcare and autonomous driving. This paper investigates representative MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning, as well as before and after multimodal training of the base LLMs. We observed miscalibration in their performance, and at the same time, no significant differences in calibration across these scenarios. We also highlight how uncertainty differs between text and images and how their integration affects overall uncertainty. To better understand MLLMs' miscalibration and their ability to self-assess uncertainty, we construct the IDK (I don't know) dataset, which is key to evaluating how they handle unknowns. Our findings reveal that MLLMs tend to give answers rather than admit uncertainty, but this self-assessment improves with proper prompt adjustments. Finally, to calibrate MLLMs and enhance model reliability, we propose techniques such as temperature scaling and iterative prompt optimization. Our results provide insights into improving MLLMs for effective and responsible deployment in multimodal applications. Code and IDK dataset: \href{https://github.com/hfutml/Calibration-MLLM}{https://github.com/hfutml/Calibration-MLLM}.

* Accepted to COLING 2025

View paper on

Share this with someone who'll enjoy it:

Title:Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Paper and Code