Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Nov 30, 2023

Zineng Tang, Ziyi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal

Figure 1 for CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Figure 2 for CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Figure 3 for CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Figure 4 for CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Share this with someone who'll enjoy it:

Abstract:We present CoDi-2, a versatile and interactive Multimodal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm. By aligning modalities with language for both encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to not only understand complex modality-interleaved instructions and in-context examples, but also autoregressively generate grounded and coherent multimodal outputs in the continuous feature space. To train CoDi-2, we build a large-scale generation dataset encompassing in-context multimodal instructions across text, vision, and audio. CoDi-2 demonstrates a wide range of zero-shot capabilities for multimodal generation, such as in-context learning, reasoning, and compositionality of any-to-any modality generation through multi-round interactive conversation. CoDi-2 surpasses previous domain-specific models on tasks such as subject-driven image generation, vision transformation, and audio editing. CoDi-2 signifies a substantial breakthrough in developing a comprehensive multimodal foundation model adept at interpreting in-context language-vision-audio interleaved instructions and producing multimodal outputs.

* Project Page: https://codi-2.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper and Code