Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yixuan Luo

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Nov 28, 2023

Yixuan Luo, Mengye Ren, Sai Qian Zhang

Figure 1 for BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Figure 2 for BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Figure 3 for BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Figure 4 for BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Abstract:Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM.

Via

Access Paper or Ask Questions