Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Prismer: A Vision-Language Model with An Ensemble of Experts

Mar 12, 2023

Shikun Liu, Linxi Fan, Edward Johns, Zhiding Yu, Chaowei Xiao, Anima Anandkumar

Share this with someone who'll enjoy it:

Abstract:Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of domain experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from readily-available, pre-trained domain experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show that Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-art models, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.

* Tech Report. Project Page: https://shikun.io/projects/prismer Code: https://github.com/NVlabs/prismer v2: fixed incorrect training cost estimate and zero-shot NoCaps performance of SimVLM

View paper on

Share this with someone who'll enjoy it:

Title:Prismer: A Vision-Language Model with An Ensemble of Experts

Paper and Code