Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:2-D SSM: A General Spatial Layer for Visual Transformers

Jun 11, 2023

Ethan Baron, Itamar Zimerman, Lior Wolf

Figure 1 for 2-D SSM: A General Spatial Layer for Visual Transformers

Figure 2 for 2-D SSM: A General Spatial Layer for Visual Transformers

Figure 3 for 2-D SSM: A General Spatial Layer for Visual Transformers

Figure 4 for 2-D SSM: A General Spatial Layer for Visual Transformers

Share this with someone who'll enjoy it:

Abstract:A central objective in computer vision is to design models with appropriate 2-D inductive bias. Desiderata for 2D inductive bias include two-dimensional position awareness, dynamic spatial locality, and translation and permutation invariance. To address these goals, we leverage an expressive variation of the multidimensional State Space Model (SSM). Our approach introduces efficient parameterization, accelerated computation, and a suitable normalization scheme. Empirically, we observe that incorporating our layer at the beginning of each transformer block of Vision Transformers (ViT) significantly enhances performance for multiple ViT backbones and across datasets. The new layer is effective even with a negligible amount of additional parameters and inference time. Ablation studies and visualizations demonstrate that the layer has a strong 2-D inductive bias. For example, vision transformers equipped with our layer exhibit effective performance even without positional encoding

* 16 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:2-D SSM: A General Spatial Layer for Visual Transformers

Paper and Code