Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TrTr: Visual Tracking with Transformer

May 09, 2021

Moju Zhao, Kei Okada, Masayuki Inaba

Figure 1 for TrTr: Visual Tracking with Transformer

Figure 2 for TrTr: Visual Tracking with Transformer

Figure 3 for TrTr: Visual Tracking with Transformer

Figure 4 for TrTr: Visual Tracking with Transformer

Share this with someone who'll enjoy it:

Abstract:Template-based discriminative trackers are currently the dominant tracking methods due to their robustness and accuracy, and the Siamese-network-based methods that depend on cross-correlation operation between features extracted from template and search images show the state-of-the-art tracking performance. However, general cross-correlation operation can only obtain relationship between local patches in two feature maps. In this paper, we propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture to gain global and rich contextual interdependencies. In this new architecture, features of the template image is processed by a self-attention module in the encoder part to learn strong context information, which is then sent to the decoder part to compute cross-attention with the search image features processed by another self-attention module. In addition, we design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. We extensively evaluate our tracker TrTr, on VOT2018, VOT2019, OTB-100, UAV, NfS, TrackingNet, and LaSOT benchmarks and our method performs favorably against state-of-the-art algorithms. Training code and pretrained models are available at https://github.com/tongtybj/TrTr.

* 11 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:TrTr: Visual Tracking with Transformer

Paper and Code