Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Nov 09, 2023

Jinjin Xu, Liwu Xu, Yuzhe Yang, Xiang Li, Yanchun Xie, Yi-Jie Huang, Yaqian Li

Figure 1 for u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Figure 2 for u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Figure 3 for u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Figure 4 for u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Share this with someone who'll enjoy it:

Abstract:Recent advances such as LLaVA and Mini-GPT4 have successfully integrated visual information into LLMs, yielding inspiring outcomes and giving rise to a new generation of multi-modal LLMs, or MLLMs. Nevertheless, these methods struggle with hallucinations and the mutual interference between tasks. To tackle these problems, we propose an efficient and accurate approach to adapt to downstream tasks by utilizing LLM as a bridge to connect multiple expert models, namely u-LLaVA. Firstly, we incorporate the modality alignment module and multi-task modules into LLM. Then, we reorganize or rebuild multi-type public datasets to enable efficient modality alignment and instruction following. Finally, task-specific information is extracted from the trained LLM and provided to different modules for solving downstream tasks. The overall framework is simple, effective, and achieves state-of-the-art performance across multiple benchmarks. We also release our model, the generated data, and the code base publicly available.

View paper on

Share this with someone who'll enjoy it:

Title:u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Paper and Code