Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Renaissance: Investigating the Pretraining of Vision-Language Encoders

Nov 11, 2024

Clayton Fields, Casey Kennington

Figure 1 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 2 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 3 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 4 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Share this with someone who'll enjoy it:

Abstract:In the past several years there has been an explosion of available models for vision-language tasks. Unfortunately, the literature still leaves open a number of questions related to best practices in designing and training such models. In this paper we seek to answer several questions related to the pretraining of vision-language encoders through meta-analysis. In our first set of experiments, we show that we can save significant compute at no cost to downstream performance, by freezing large parts of vision-language models during pretraining. In our second set of experiments we examine the effect of basing a VL transformer on a vision model versus a text model. Additionally, we introduce a VL modeling platform called Renaissance that we use to conduct all of the experiments. This program offers a great deal of flexibility in creating, training and evaluating transformer encoders for VL modeling. The source code for Renaissance can be found at https://github.com/bsu-slim/renaissance.

View paper on

Share this with someone who'll enjoy it:

Title:Renaissance: Investigating the Pretraining of Vision-Language Encoders

Paper and Code