Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

May 19, 2022

Xiaosong Zhang, Feng Liu, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

Figure 1 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Figure 2 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Figure 3 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Figure 4 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Share this with someone who'll enjoy it:

Abstract:Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models. In this study, we propose to integrally migrate the pre-trained transformer encoder-decoders (imTED) for object detection, constructing a feature extraction-operation path that is not only "fully pre-trained" but also consistent with pre-trained models. The essential improvements of imTED over existing transformer-based detectors are twofold: (1) it embeds the pre-trained transformer decoder to the detector head; and (2) it removes the feature pyramid network from the feature extraction path. Such improvements significantly reduce the proportion of randomly initialized parameters and enhance the generation capability of detectors. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by ~2.8% AP. Without bells and whistles, imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP, demonstrating significantly higher generalization capability. Code will be made publicly available.

* 12 pages,5 figures

View paper on

Share this with someone who'll enjoy it:

Title:Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Paper and Code