Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Nov 27, 2022

Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, Ponnuthurai N. Suganthan

Figure 1 for Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Figure 2 for Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Figure 3 for Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Figure 4 for Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Share this with someone who'll enjoy it:

Abstract:The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Paper and Code