Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large Language Models can Share Images, Too!

Oct 23, 2023

Young-Jun Lee, Jonghwan Hyeon, Ho-Jin Choi

Figure 1 for Large Language Models can Share Images, Too!

Figure 2 for Large Language Models can Share Images, Too!

Figure 3 for Large Language Models can Share Images, Too!

Figure 4 for Large Language Models can Share Images, Too!

Share this with someone who'll enjoy it:

Abstract:This paper explores the image-sharing capability of Large Language Models (LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting, without the help of visual foundation models. Inspired by the two-stage process of image-sharing in human dialogues, we propose a two-stage framework that allows LLMs to predict potential image-sharing turns and generate related image descriptions using our effective restriction-based prompt template. With extensive experiments, we unlock the \textit{image-sharing} capability of LLMs in zero-shot prompting, with GPT-4 achieving the best performance. Additionally, we uncover the emergent \textit{image-sharing} ability in zero-shot prompting, demonstrating the effectiveness of restriction-based prompts in both stages of our framework. Based on this framework, we augment the PhotoChat dataset with images generated by Stable Diffusion at predicted turns, namely PhotoChat++. To our knowledge, this is the first study to assess the image-sharing ability of LLMs in a zero-shot setting without visual foundation models. The source code and the dataset will be released after publication.

View paper on

Share this with someone who'll enjoy it:

Title:Large Language Models can Share Images, Too!

Paper and Code