Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Jun 15, 2022

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

Figure 1 for Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Figure 2 for Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Figure 3 for Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Figure 4 for Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Share this with someone who'll enjoy it:

Abstract:We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. Instead of randomly inserting mask tokens to the input embeddings in the spatial domain, in this paper, we shift the perspective to the frequency domain. Specifically, MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. Our key insight is that predicting masked components in the frequency domain is more ideal to reveal underlying image patterns rather than predicting masked patches in the spatial domain, due to the heavy spatial redundancy. Our findings suggest that with the right configuration of mask-and-predict strategy, both the structural information within high-frequency components and the low-level statistics among low-frequency counterparts are useful in learning good representations. For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token. Experimental results on ImageNet and several robustness benchmarks show the competitive performance and advanced robustness of MFM compared with recent masked image modeling approaches. Furthermore, we also comprehensively investigate the effectiveness of classical image restoration tasks for representation learning from a unified frequency perspective and reveal their intriguing relations with our MFM approach. Project page: https://www.mmlab-ntu.com/project/mfm/index.html.

* Project page: https://www.mmlab-ntu.com/project/mfm/index.html

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Paper and Code