Abstract:As a variety of automated collision prevention systems gain presence within personal vehicles, rating and differentiating the automated safety performance of car models has become increasingly important for consumers, manufacturers, and insurers. In 2023, Swiss Re and partners initiated an eight-month long vehicle testing campaign conducted on a recognized UNECE type approval authority and Euro NCAP accredited proving ground in Germany. The campaign exposed twelve mass-produced vehicle models and one prototype vehicle fitted with collision prevention systems to a selection of safety-critical traffic scenarios representative of United States and European Union accident landscape. In this paper, we compare and evaluate the relative safety performance of these thirteen collision prevention systems (hardware and software stack) as demonstrated by this testing campaign. We first introduce a new scoring system which represents a test system's predicted impact on overall real-world collision frequency and reduction of collision impact energy, weighted based on the real-world relevance of the test scenario. Next, we introduce a novel metric that quantifies the realism of the protocol and confirm that our test protocol is a plausible representation of real-world driving. Finally, we find that the prototype system in its pre-release state outperforms the mass-produced (post-consumer-release) vehicles in the majority of the tested scenarios on the test track.
Abstract:This work introduces a framework to diagnose the strengths and shortcomings of Autonomous Vehicle (AV) collision avoidance technology with synthetic yet realistic potential collision scenarios adapted from real-world, collision-free data. Our framework generates counterfactual collisions with diverse crash properties, e.g., crash angle and velocity, between an adversary and a target vehicle by adding perturbations to the adversary's predicted trajectory from a learned AV behavior model. Our main contribution is to ground these adversarial perturbations in realistic behavior as defined through the lens of data-alignment in the behavior model's parameter space. Then, we cluster these synthetic counterfactuals to identify plausible and representative collision scenarios to form the basis of a test suite for downstream AV system evaluation. We demonstrate our framework using two state-of-the-art behavior prediction models as sources of realistic adversarial perturbations, and show that our scenario clustering evokes interpretable failure modes from a baseline AV policy under evaluation.
Abstract:The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance. In January 2024, a working group of experts across academia, insurance, and industry came together in Washington, DC to discuss the current and future challenges in performing such evaluations. A subset of this working group then met, virtually, on multiple occasions to produce this paper. This paper presents the RAVE (Retrospective Automated Vehicle Evaluation) checklist, a set of fifteen recommendations for performing and evaluating retrospective ADS performance comparisons. The recommendations are centered around the concepts of (1) quality and validity, (2) transparency, and (3) interpretation. Over time, it is anticipated there will be a large and varied body of work evaluating the observed performance of these ADS fleets. Establishing and promoting good scientific practices benefits the work of stakeholders, many of whom may not be subject matter experts. This working group's intentions are to: i) strengthen individual research studies and ii) make the at-large community more informed on how to evaluate this collective body of work.
Abstract:This study compares the safety of autonomous- and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are, as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage- and zip-code-calibrated Swiss Re (human driver) private passenger vehicle baselines. A liability claim is a request for compensation when someone is responsible for damage to property or injury to another person, typically following a collision. Liability claims reporting and their development is designed using insurance industry best practices to assess crash causation contribution and predict future crash contributions. In over 3.8 million miles driven without a human being behind the steering wheel in rider-only (RO) mode, the Waymo Driver incurred zero bodily injury claims in comparison with the human driver baseline of 1.11 claims per million miles (cpmm). The Waymo Driver also significantly reduced property damage claims to 0.78 cpmm in comparison with the human driver baseline of 3.26 cpmm. Similarly, in a more statistically robust dataset of over 35 million miles during autonomous testing operations (TO), the Waymo Driver, together with a human autonomous specialist behind the steering wheel monitoring the automation, also significantly reduced both bodily injury and property damage cpmm compared to the human driver baselines.
Abstract:Autonomous Vehicles (AVs) have the potential to provide numerous societal benefits, such as decreased road accidents and increased overall transportation efficiency. However, quantifying the risk associated with AVs is challenging due to the lack of historical data and the rapidly evolving technology. This paper presents a data-driven framework for comparing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to find the most critical scenarios but also to assess the frequency and severity of risk of AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown -- through worst- and best-case analyses -- making the method useful also to external third-party risk assessors. Our experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and aids in addressing legislative and insurance concerns surrounding this emerging technology.