Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DPT: Deformable Patch-based Transformer for Visual Recognition

Jul 30, 2021

Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, Ming Tang

Figure 1 for DPT: Deformable Patch-based Transformer for Visual Recognition

Figure 2 for DPT: Deformable Patch-based Transformer for Visual Recognition

Figure 3 for DPT: Deformable Patch-based Transformer for Visual Recognition

Figure 4 for DPT: Deformable Patch-based Transformer for Visual Recognition

Share this with someone who'll enjoy it:

Abstract:Transformer has achieved great success in computer vision, while how to split patches in an image remains a problem. Existing methods usually use a fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches. The DePatch module can work as a plug-and-play module, which can easily be incorporated into different transformers to achieve an end-to-end training. We term this DePatch-embedded transformer as Deformable Patch-based Transformer (DPT) and conduct extensive evaluations of DPT on image classification and object detection. Results show DPT can achieve 81.9% top-1 accuracy on ImageNet classification, and 43.7% box mAP with RetinaNet, 44.3% with Mask R-CNN on MSCOCO object detection. Code has been made available at: https://github.com/CASIA-IVA-Lab/DPT .

* In Proceedings of the 29th ACM International Conference on Multimedia (MM '21)

View paper on

Share this with someone who'll enjoy it:

Title:DPT: Deformable Patch-based Transformer for Visual Recognition

Paper and Code