Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MM-LLMs: Recent Advances in MultiModal Large Language Models

Jan 25, 2024

Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu

Figure 1 for MM-LLMs: Recent Advances in MultiModal Large Language Models

Figure 2 for MM-LLMs: Recent Advances in MultiModal Large Language Models

Figure 3 for MM-LLMs: Recent Advances in MultiModal Large Language Models

Figure 4 for MM-LLMs: Recent Advances in MultiModal Large Language Models

Share this with someone who'll enjoy it:

Abstract:In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Specifically, we first outline general design formulations for model architecture and training pipeline. Subsequently, we provide brief introductions of $26$ existing MM-LLMs, each characterized by its specific formulations. Additionally, we review the performance of MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Lastly, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper and Code