Abstract:Object detection is one of the most important and fundamental aspects of computer vision tasks, which has been broadly utilized in pose estimation, object tracking and instance segmentation models. To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format and the annotator needs to draw a bounding box around each object in the images. Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from. How to select the most informative frames from a video to annotate has become a highly practical task to solve but attracted little attention in research. In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem. In the proposed active learning algorithm, both classification and localization informativeness of unlabelled data are measured and aggregated. Utilizing the temporal information from video frames, two novel localization informativeness measurements are proposed. Furthermore, a weight curve is proposed to avoid querying adjacent frames. Proposed active learning algorithm with multiple configurations was evaluated on the MuPoTS dataset and FootballPD dataset.