Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Nov 18, 2020

Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen

Figure 1 for UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Figure 2 for UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Figure 3 for UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Figure 4 for UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Share this with someone who'll enjoy it:

Abstract:Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to unsupervisedly pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade-off multi-task learning of classification and localization in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher precision on PASCAL VOC and COCO datasets. The code will be available soon.

View paper on

Share this with someone who'll enjoy it:

Title:UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper and Code