Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Dec 31, 2020

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr(+1 more)

Figure 1 for Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Figure 2 for Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Figure 3 for Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Figure 4 for Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Share this with someone who'll enjoy it:

Abstract:Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first (44.42% mIoU) position in the highly competitive ADE20K test server leaderboard.

* project page at https://fudan-zvg.github.io/SETR/

View paper on

Share this with someone who'll enjoy it:

Title:Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper and Code