https://github.com/dyhBUPT/HIT.
Multi-Object Tracking (MOT) aims to detect and associate all targets of given classes across frames. Current dominant solutions, e.g. ByteTrack and StrongSORT++, follow the hybrid pipeline, which first accomplish most of the associations in an online manner, and then refine the results using offline tricks such as interpolation and global link. While this paradigm offers flexibility in application, the disjoint design between the two stages results in suboptimal performance. In this paper, we propose the Hierarchical IoU Tracking framework, dubbed HIT, which achieves unified hierarchical tracking by utilizing tracklet intervals as priors. To ensure the conciseness, only IoU is utilized for association, while discarding the heavy appearance models, tricky auxiliary cues, and learning-based association modules. We further identify three inconsistency issues regarding target size, camera movement and hierarchical cues, and design corresponding solutions to guarantee the reliability of associations. Though its simplicity, our method achieves promising performance on four datasets, i.e., MOT17, KITTI, DanceTrack and VisDrone, providing a strong baseline for future tracking method design. Moreover, we experiment on seven trackers and prove that HIT can be seamlessly integrated with other solutions, whether they are motion-based, appearance-based or learning-based. Our codes will be released at