Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Examination of the Compositionality of Large Generative Vision-Language Models

Aug 21, 2023

Teli Ma, Rong Li, Junwei Liang

Figure 1 for An Examination of the Compositionality of Large Generative Vision-Language Models

Figure 2 for An Examination of the Compositionality of Large Generative Vision-Language Models

Figure 3 for An Examination of the Compositionality of Large Generative Vision-Language Models

Figure 4 for An Examination of the Compositionality of Large Generative Vision-Language Models

Share this with someone who'll enjoy it:

Abstract:With the success of Large Language Models (LLMs), a surge of Generative Vision-Language Models (GVLMs) have been constructed via multimodal instruction tuning. The tuning recipe substantially deviates from the common contrastive vision-language learning. However, the performance of GVLMs in multimodal compositional reasoning remains largely unexplored, as existing evaluation metrics and benchmarks focus predominantly on assessing contrastive models like CLIP. In this paper, we examine the potential evaluation metrics to assess the GVLMs and hypothesize generative score methods are suitable for evaluating compositionality. In addition, current benchmarks tend to prioritize syntactic correctness over semantics. The presence of morphological bias in these benchmarks can be exploited by GVLMs, leading to ineffective evaluations. To combat this, we define a MorphoBias Score to quantify the morphological bias and propose a novel LLM-based strategy to calibrate the bias. Moreover, a challenging task is added to evaluate the robustness of GVLMs against inherent inclination toward syntactic correctness. We include the calibrated dataset and the task into a new benchmark, namely MOrphologicall De-biased Benchmark (MODE). Our study provides the first unbiased benchmark for the compositionality of GVLMs, facilitating future research in this direction. We will release our code and datasets.

View paper on

Share this with someone who'll enjoy it:

Title:An Examination of the Compositionality of Large Generative Vision-Language Models

Paper and Code