Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Do DALL-E and Flamingo Understand Each Other?

Dec 23, 2022

Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp

Figure 1 for Do DALL-E and Flamingo Understand Each Other?

Figure 2 for Do DALL-E and Flamingo Understand Each Other?

Figure 3 for Do DALL-E and Flamingo Understand Each Other?

Figure 4 for Do DALL-E and Flamingo Understand Each Other?

Share this with someone who'll enjoy it:

Abstract:A major goal of multimodal research is to improve machine understanding of images and text. Tasks include image captioning, text-to-image generation, and vision-language representation learning. So far, research has focused on the relationships between images and text. For example, captioning models attempt to understand the semantics of images which are then transformed into text. An important question is: which annotation reflects best a deep understanding of image content? Similarly, given a text, what is the best image that can present the semantics of the text? In this work, we argue that the best text or caption for a given image is the text which would generate the image which is the most similar to that image. Likewise, the best image for a given text is the image that results in the caption which is best aligned with the original text. To this end, we propose a unified framework that includes both a text-to-image generative model and an image-to-text generative model. Extensive experiments validate our approach.

View paper on

Share this with someone who'll enjoy it:

Title:Do DALL-E and Flamingo Understand Each Other?

Paper and Code