Abstract:Multi-object tracking (MOT) has been dominated by the use of track by detection approaches due to the success of convolutional neural networks (CNNs) on detection in the last decade. As the datasets and bench-marking sites are published, research direction has shifted towards yielding best accuracy on generic scenarios including re-identification (reID) of objects while tracking. In this study, we narrow the scope of MOT for surveillance by providing a dedicated dataset of pedestrians and focus on in-depth analyses of well performing multi-object trackers to observe the weak and strong sides of state-of-the-art (SOTA) techniques for real-world applications. For this purpose, we introduce SOMPT22 dataset; a new set for multi person tracking with annotated short videos captured from static cameras located on poles with 6-8 meters in height positioned for city surveillance. This provides a more focused and specific benchmarking of MOT for outdoor surveillance compared to public MOT datasets. We analyze MOT trackers classified as one-shot and two-stage with respect to the way of use of detection and reID networks on this new dataset. The experimental results of our new dataset indicate that SOTA is still far from high efficiency, and single-shot trackers are good candidates to unify fast execution and accuracy with competitive performance. The dataset will be available at: sompt22.github.io