Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SPFormer: Enhancing Vision Transformer with Superpixel Representation

Jan 05, 2024

Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

Figure 1 for SPFormer: Enhancing Vision Transformer with Superpixel Representation

Figure 2 for SPFormer: Enhancing Vision Transformer with Superpixel Representation

Figure 3 for SPFormer: Enhancing Vision Transformer with Superpixel Representation

Figure 4 for SPFormer: Enhancing Vision Transformer with Superpixel Representation

Share this with someone who'll enjoy it:

Abstract:In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and applicable at both initial and intermediate feature levels. SPFormer, trainable end-to-end, exhibits superior performance across various benchmarks. Notably, it exhibits significant improvements on the challenging ImageNet benchmark, achieving a 1.4% increase over DeiT-T and 1.1% over DeiT-S respectively. A standout feature of SPFormer is its inherent explainability. The superpixel structure offers a window into the model's internal processes, providing valuable insights that enhance the model's interpretability. This level of clarity significantly improves SPFormer's robustness, particularly in challenging scenarios such as image rotations and occlusions, demonstrating its adaptability and resilience.

View paper on

Share this with someone who'll enjoy it:

Title:SPFormer: Enhancing Vision Transformer with Superpixel Representation

Paper and Code