Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Binwang Wan

A Survey of Multimodal Large Language Model from A Data-centric Perspective

May 26, 2024

Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang

Figure 1 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 2 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 3 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 4 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Abstract:Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field.

Via

Access Paper or Ask Questions