The widescale deployment of Autonomous Vehicles (AV) seems to be imminent despite many safety challenges that are yet to be resolved. It is well known that there are no universally agreed Verification and Validation (VV) methodologies to guarantee absolute safety, which is crucial for the acceptance of this technology. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. Modern autonomous vehicles will undoubtedly include machine learning and probabilistic techniques that require a much more comprehensive testing regime due to the non-deterministic nature of the operating design domain. A rigourous statistical validation process is an essential component required to address this challenge. Most research in this area focuses on evaluating system performance in large scale real-world data gathering exercises (number of miles travelled), or randomised test scenarios in simulation. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios, identifying potential unsafe edge cases.We use reinforcement learning (RL) to learn the behaviours of simulated actors that cause unsafe behaviour measured by the well established RSS safety metric. We demonstrate that by using the method we can more efficiently validate a system using a smaller number of test cases by focusing the simulation towards the worst case scenario, generating edge cases that correspond to unsafe situations.