Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Probing Multimodal Large Language Models for Global and Local Semantic Representation

Feb 27, 2024

Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao

Figure 1 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 2 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 3 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 4 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Share this with someone who'll enjoy it:

Abstract:The success of large language models has inspired researchers to transfer their exceptional representing ability to other modalities. Several recent works leverage image-caption alignment datasets to train multimodal large language models (MLLMs), which achieve state-of-the-art performance on image-to-text tasks. However, there are very few studies exploring whether MLLMs truly understand the complete image information, i.e., global information, or if they can only capture some local object information. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models for local semantic representation through object detection tasks. And we draw a conclusion that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.

* Accepted by LREC-COLING 2024 as a short paper

View paper on

Share this with someone who'll enjoy it:

Title:Probing Multimodal Large Language Models for Global and Local Semantic Representation

Paper and Code