Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fast Convergence of DETR with Spatially Modulated Co-Attention

Aug 05, 2021

Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Figure 1 for Fast Convergence of DETR with Spatially Modulated Co-Attention

Figure 2 for Fast Convergence of DETR with Spatially Modulated Co-Attention

Figure 3 for Fast Convergence of DETR with Spatially Modulated Co-Attention

Figure 4 for Fast Convergence of DETR with Spatially Modulated Co-Attention

Share this with someone who'll enjoy it:

Abstract:The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN. However, DETR suffers from its slow convergence. Training DETR from scratch needs 500 epochs to achieve a high accuracy. To accelerate its convergence, we propose a simple yet effective scheme for improving the DETR framework, namely Spatially Modulated Co-Attention (SMCA) mechanism. The core idea of SMCA is to conduct location-aware co-attention in DETR by constraining co-attention responses to be high near initially estimated bounding box locations. Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder while keeping other operations in DETR unchanged. Furthermore, by integrating multi-head and scale-selection attention designs into SMCA, our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone (45.6 mAP at 108 epochs vs. 43.3 mAP at 500 epochs). We perform extensive ablation studies on COCO dataset to validate SMCA. Code is released at https://github.com/gaopengcuhk/SMCA-DETR .

* Accepted by ICCV2021

View paper on

Share this with someone who'll enjoy it:

Title:Fast Convergence of DETR with Spatially Modulated Co-Attention

Paper and Code