Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Sep 02, 2024

Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

Figure 1 for Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Figure 2 for Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Figure 3 for Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Figure 4 for Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Share this with someone who'll enjoy it:

Abstract:Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters. Notably, Pfram can also assess the alignment of a neural representation system with the human representation system, represented by ground-truth annotations of images. By evaluating the alignment with object annotations, we demonstrate that this metric shows strong and consistent correlations with object hallucination across a wide range of state-of-the-art MLLMs, spanning various model architectures and sizes. Furthermore, using this metric, we explore other key issues related to image representations in MLLMs, such as the role of different modules, the impact of textual instructions, and potential improvements including the use of alternative visual encoders. Our code is available at: https://github.com/yellow-binary-tree/Pfram.

View paper on

Share this with someone who'll enjoy it:

Title:Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Paper and Code