Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

May 31, 2022

Peng Dai, Yiqiang Feng, Renliang Weng, Changshui Zhang

Figure 1 for Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Figure 2 for Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Figure 3 for Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Figure 4 for Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Share this with someone who'll enjoy it:

Abstract:The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes the powerful self-attention mechanism of Transformer to learn discriminative features for each tracklet; (2) The decoder adopts the standard cross-attention mechanism to model the affinities between the tracklets and the detections by taking both spatial-temporal and appearance features into account. TransSTAM has two major advantages: (1) It is solely based on the encoder-decoder architecture and enjoys a compact network design, hence being computationally efficient; (2) It can effectively learn spatial-temporal and appearance features within one model, hence achieving better tracking accuracy. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA with respect to previous state-of-the-art approaches on all the benchmarks. Our code is available at \url{https://github.com/icicle4/TranSTAM}.

View paper on

Share this with someone who'll enjoy it:

Title:Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Paper and Code