Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Mar 13, 2025

Jinyang Li, En Yu, Sijia Chen, Wenbing Tao

Figure 1 for OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Figure 2 for OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Figure 3 for OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Figure 4 for OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Share this with someone who'll enjoy it:

Abstract:Open-vocabulary multiple object tracking aims to generalize trackers to unseen categories during training, enabling their application across a variety of real-world scenarios. However, the existing open-vocabulary tracker is constrained by its framework structure, isolated frame-level perception, and insufficient modal interactions, which hinder its performance in open-vocabulary classification and tracking. In this paper, we propose OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. To achieve stable classification and continuous tracking, we design the CIP (Category Information Propagation) strategy, which establishes multiple high-level category information priors for subsequent frames. Additionally, we introduce a dual-branch structure for generalization capability and deep multimodal interaction, and incorporate protective strategies in the decoder to enhance performance. Experimental results show that our method surpasses previous trackers on the open-vocabulary MOT benchmark while also achieving faster inference speeds and significantly reducing preprocessing requirements. Moreover, the experiment transferring the model to another dataset demonstrates its strong adaptability. Models and code are released at https://github.com/jinyanglii/OVTR.

* Accepted by ICLR 2025

View paper on

Share this with someone who'll enjoy it:

Title:OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Paper and Code