Abstract:Tracking multiple athletes in sports videos is a very challenging Multi-Object Tracking (MOT) task, since athletes often have the same appearance and are intimately covered with each other, making a common occlusion problem becomes an abhorrent duplicate detection. In this paper, the duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame. To address this problem, we meticulously design a novel transformer-based Duplicate Detection Decontaminator (D$^3$) for training, and a specific algorithm Rally-Hungarian (RH) for matching. Once duplicate detection occurs, D$^3$ immediately modifies the procedure by generating enhanced boxes losses. RH, triggered by the team sports substitution rules, is exceedingly suitable for sports videos. Moreover, to complement the tracking dataset that without shot changes, we release a new dataset based on sports video named RallyTrack. Extensive experiments on RallyTrack show that combining D$^3$ and RH can dramatically improve the tracking performance with 9.2 in MOTA and 4.5 in HOTA. Meanwhile, experiments on MOT-series and DanceTrack discover that D$^3$ can accelerate convergence during training, especially save up to 80 percent of the original training time on MOT17. Finally, our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos for MAT, which shows priority of our method. Our dataset is available at https://github.com/heruihr/rallytrack.