Abstract:An automatic gun detection system can detect potential gun-related violence at an early stage that is of paramount importance for citizens security. In the whole system, object detection algorithm is the key to perceive the environment so that the system can detect dangerous objects such as pistols and rifles. However, mainstream deep learning-based object detection algorithms depend heavily on large-scale high-quality annotated samples, and the existing gun datasets are characterized by low resolution, little contextual information and little data volume. To promote the development of security, this work presents a new challenging dataset called YouTube Gun Detection Dataset (YouTube-GDD). Our dataset is collected from 343 high-definition YouTube videos and contains 5000 well-chosen images, in which 16064 instances of gun and 9046 instances of person are annotated. Compared to other datasets, YouTube-GDD is "dynamic", containing rich contextual information and recording shape changes of the gun during shooting. To build a baseline for gun detection, we evaluate YOLOv5 on YouTube-GDD and analyze the influence of additional related annotated information on gun detection. YouTube-GDD and subsequent updates will be released at https://github.com/UCAS-GYX/YouTube-GDD.