Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Feb 08, 2024

Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin(+9 more)

Figure 1 for SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Figure 2 for SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Figure 3 for SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Figure 4 for SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Share this with someone who'll enjoy it:

Abstract:We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

* Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

View paper on

Share this with someone who'll enjoy it:

Title:SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Paper and Code