Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Nov 24, 2021

David Junhao Zhang, Kunchang Li, Yunpeng Chen, Yali Wang, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou

Figure 1 for MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Figure 2 for MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Figure 3 for MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Figure 4 for MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Share this with someone who'll enjoy it:

Abstract:Self-attention has become an integral component of the recent network architectures, e.g., Transformer, that dominate major image and video benchmarks. This is because self-attention can flexibly model long-range information. For the same reason, researchers make attempts recently to revive Multiple Layer Perceptron (MLP) and propose a few MLP-Like architectures, showing great potential. However, the current MLP-Like architectures are not good at capturing local details and lack progressive understanding of core details in the images and/or videos. To overcome this issue, we propose a novel MorphMLP architecture that focuses on capturing local details at the low-level layers, while gradually changing to focus on long-term modeling at the high-level layers. Specifically, we design a Fully-Connected-Like layer, dubbed as MorphFC, of two morphable filters that gradually grow its receptive field along the height and width dimension. More interestingly, we propose to flexibly adapt our MorphFC layer in the video domain. To our best knowledge, we are the first to create a MLP-Like backbone for learning video representation. Finally, we conduct extensive experiments on image classification, semantic segmentation and video classification. Our MorphMLP, such a self-attention free backbone, can be as powerful as and even outperform self-attention based models.

* preprint version

View paper on

Share this with someone who'll enjoy it:

Title:MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Paper and Code