Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Efficient Multimodal Learning from Data-centric Perspective

Feb 18, 2024

Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, Bo Zhao

Figure 1 for Efficient Multimodal Learning from Data-centric Perspective

Figure 2 for Efficient Multimodal Learning from Data-centric Perspective

Figure 3 for Efficient Multimodal Learning from Data-centric Perspective

Figure 4 for Efficient Multimodal Learning from Data-centric Perspective

Share this with someone who'll enjoy it:

Abstract:Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks. However, their deployment is hindered by substantial computational costs in both training and inference, limiting accessibility to the broader research and user communities. A straightforward solution is to leverage smaller pre-trained vision and language models, which inevitably causes significant performance drop. In this paper, we demonstrate the possibility to beat the scaling law and train a smaller but better MLLM by exploring more informative training data. Specifically, we introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning from condensed training data. Remarkably, our Bunny-3B outperforms the state-of-the-art large MLLMs, especially LLaVA-v1.5-13B, on multiple benchmarks. The code, models and data can be found in https://github.com/BAAI-DCAI/Bunny.

View paper on

Share this with someone who'll enjoy it:

Title:Efficient Multimodal Learning from Data-centric Perspective

Paper and Code