Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Jun 21, 2021

Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Figure 1 for You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Figure 2 for You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Figure 3 for You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Figure 4 for You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Share this with someone who'll enjoy it:

Abstract:Can Transformer perform $2\mathrm{D}$ object-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the $2\mathrm{D}$ spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the na\"ive Vision Transformer with the fewest possible modifications as well as inductive biases. We find that YOLOS pre-trained on the mid-sized ImageNet-$1k$ dataset only can already achieve competitive object detection performance on COCO, \textit{e.g.}, YOLOS-Base directly adopted from BERT-Base can achieve $42.0$ box AP. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through object detection. Code and model weights are available at \url{https://github.com/hustvl/YOLOS}.

* 18 pages, 7 tables, 5 figures. Add Appendix & some missing references

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Paper and Code