Abstract:Autonomous driving has rapidly developed and shown promising performance due to recent advances in hardware and deep learning techniques. High-quality datasets are fundamental for developing reliable autonomous driving algorithms. Previous dataset surveys either focused on a limited number or lacked detailed investigation of dataset characteristics. Besides, we analyze the annotation processes, existing labeling tools, and the annotation quality of datasets, showing the importance of establishing a standard annotation pipeline. On the other hand, we thoroughly analyze the impact of geographical and adversarial environmental conditions on the performance of autonomous driving systems. Moreover, we exhibit the data distribution of several vital datasets and discuss their pros and cons accordingly. Additionally, this paper provides a comprehensive analysis of publicly available traffic simulators. In addition to informing about traffic datasets, it is also the goal of this paper to provide context and information on the current capabilities of traffic simulators for their specific contributions to autonomous vehicle simulation and development. Furthermore, this paper discusses future directions and the increasing importance of synthetic data generation in simulators to enhance the training and creation of effective simulations. Finally, we discuss the current challenges and the development trend of future autonomous driving datasets.