Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Demystify Transformers & Convolutions in Modern Image Deep Networks

Nov 10, 2022

Jifeng Dai, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie Zhou, Xiaogang Wang(+2 more)

Figure 1 for Demystify Transformers & Convolutions in Modern Image Deep Networks

Figure 2 for Demystify Transformers & Convolutions in Modern Image Deep Networks

Figure 3 for Demystify Transformers & Convolutions in Modern Image Deep Networks

Figure 4 for Demystify Transformers & Convolutions in Modern Image Deep Networks

Share this with someone who'll enjoy it:

Abstract:Recent success of vision transformers has inspired a series of vision backbones with novel feature transformation paradigms, which report steady performance gain. Although the novel feature transformation designs are often claimed as the source of gain, some backbones may benefit from advanced engineering techniques, which makes it hard to identify the real gain from the key feature transformation operators. In this paper, we aim to identify real gain of popular convolution and attention operators and make an in-depth study of them. We observe that the main difference among these feature transformation modules, e.g., attention or convolution, lies in the way of spatial feature aggregation, or the so-called "spatial token mixer" (STM). Hence, we first elaborate a unified architecture to eliminate the unfair impact of different engineering techniques, and then fit STMs into this architecture for comparison. Based on various experiments on upstream/downstream tasks and the analysis of inductive bias, we find that the engineering techniques boost the performance significantly, but the performance gap still exists among different STMs. The detailed analysis also reveals some interesting findings of different STMs, such as effective receptive fields and invariance tests. The code and trained models will be publicly available at https://github.com/OpenGVLab/STM-Evaluation

View paper on

Share this with someone who'll enjoy it:

Title:Demystify Transformers & Convolutions in Modern Image Deep Networks

Paper and Code