Tracking data is a powerful tool for basketball teams in order to extract advanced semantic information and statistics that might lead to a performance boost. However, multi-person tracking is a challenging task to solve in single-camera video sequences, given the frequent occlusions and cluttering that occur in a restricted scenario. In this paper, a novel multi-scale detection method is presented, which is later used to extract geometric and content features, resulting in a multi-person video tracking system. Having built a dataset from scratch together with its ground truth (more than 10k bounding boxes), standard metrics are evaluated, obtaining notable results both in terms of detection (F1-score) and tracking (MOTA). The presented system could be used as a source of data gathering in order to extract useful statistics and semantic analyses a posteriori.