Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multimodal Large Language Models: A Survey

Nov 22, 2023

Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Philip S. Yu

Figure 1 for Multimodal Large Language Models: A Survey

Figure 2 for Multimodal Large Language Models: A Survey

Figure 3 for Multimodal Large Language Models: A Survey

Figure 4 for Multimodal Large Language Models: A Survey

Share this with someone who'll enjoy it:

Abstract:The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects of multimodal models. Moreover, we present a compilation of the latest algorithms and commonly used datasets, providing researchers with valuable resources for experimentation and evaluation. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development. By addressing these aspects, this paper aims to facilitate a deeper understanding of multimodal models and their potential in various domains.

* IEEE BigData 2023. 10 pages

View paper on

Share this with someone who'll enjoy it:

Title:Multimodal Large Language Models: A Survey

Paper and Code