Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Too Large; Data Reduction for Vision-Language Pre-Training

Jun 01, 2023

Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

Figure 1 for Too Large; Data Reduction for Vision-Language Pre-Training

Figure 2 for Too Large; Data Reduction for Vision-Language Pre-Training

Figure 3 for Too Large; Data Reduction for Vision-Language Pre-Training

Figure 4 for Too Large; Data Reduction for Vision-Language Pre-Training

Share this with someone who'll enjoy it:

Abstract:This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M ($\sim$24\%) and noisy YFCC15M from 15M to 2.5M ($\sim$16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/data-centric.vlp}.

* Work in progress. Code: https://github.com/showlab/data-centric.vlp

View paper on

Share this with someone who'll enjoy it:

Title:Too Large; Data Reduction for Vision-Language Pre-Training

Paper and Code