Abstract:Virtual scenario-based testing methods to validate autonomous driving systems are predominantly centred around collision avoidance, and lack a comprehensive approach to evaluate optimal driving behaviour holistically. Furthermore, current validation approaches do not align with authorisation and monitoring requirements put forth by regulatory bodies. We address these validation gaps by outlining a universal evaluation framework that: incorporates the notion of careful and competent driving, unifies behavioural competencies and evaluation criteria, and is amenable at a scenario-specific and aggregate behaviour level. This framework can be leveraged to evaluate optimal driving in scenario-based testing, and for post-deployment monitoring to ensure continual compliance with regulation and safety standards.
Abstract:This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.