Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Nov 29, 2021

Byungseok Roh, JaeWoong Shin, Wuhyun Shin, Saehoon Kim

Figure 1 for Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Figure 2 for Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Figure 3 for Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Figure 4 for Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Share this with someone who'll enjoy it:

Abstract:DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR, enhances the efficiency of DETR by replacing dense attention with deformable attention, which achieves 10x faster convergence and improved performance. Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck. In our preliminary experiment, we observe that the detection performance hardly deteriorates even if only a part of the encoder token is updated. Inspired by this observation, we propose Sparse DETR that selectively updates only the tokens expected to be referenced by the decoder, thus help the model effectively detect objects. In addition, we show that applying an auxiliary detection loss on the selected tokens in the encoder improves the performance while minimizing computational overhead. We validate that Sparse DETR achieves better performance than Deformable DETR even with only 10% encoder tokens on the COCO dataset. Albeit only the encoder tokens are sparsified, the total computation cost decreases by 38% and the frames per second (FPS) increases by 42% compared to Deformable DETR. Code is available at https://github.com/kakaobrain/sparse-detr

* Code is available at https://github.com/kakaobrain/sparse-detr

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Paper and Code