Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Oct 05, 2023

Kaizhi Zheng, Xuehai He, Xin Eric Wang

Figure 1 for MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Figure 2 for MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Figure 3 for MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Figure 4 for MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have garnered significant attention for their advancements in natural language processing, demonstrating unparalleled prowess in text comprehension and generation. Yet, the simultaneous generation of images with coherent textual narratives remains an evolving frontier. In response, we introduce an innovative interleaved vision-and-language generation technique anchored by the concept of "generative vokens," acting as the bridge for harmonized image-text outputs. Our approach is characterized by a distinctive two-staged training strategy focusing on description-free multimodal generation, where the training requires no comprehensive descriptions of images. To bolster model integrity, classifier-free guidance is incorporated, enhancing the effectiveness of vokens on image generation. Our model, MiniGPT-5, exhibits substantial improvement over the baseline Divter model on the MMDialog dataset and consistently delivers superior or comparable multimodal outputs in human evaluations on the VIST dataset, highlighting its efficacy across diverse benchmarks.

* 20 pages, 9 figures

View paper on

Share this with someone who'll enjoy it:

Title:MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Paper and Code