Abstract:Deployed SAE level 4+ Automated Driving Systems (ADS) without a human driver are currently operational ride-hailing fleets on surface streets in the United States. This current use case and future applications of this technology will determine where and when the fleets operate, potentially resulting in a divergence from the distribution of driving of some human benchmark population within a given locality. Existing benchmarks for evaluating ADS performance have only done county-level geographical matching of the ADS and benchmark driving exposure in crash rates. This study presents a novel methodology for constructing dynamic human benchmarks that adjust for spatial and temporal variations in driving distribution between an ADS and the overall human driven fleet. Dynamic benchmarks were generated using human police-reported crash data, human vehicle miles traveled (VMT) data, and over 20 million miles of Waymo's rider-only (RO) operational data accumulated across three US counties. The spatial adjustment revealed significant differences across various severity levels in adjusted crash rates compared to unadjusted benchmarks with these differences ranging from 10% to 47% higher in San Francisco, 12% to 20% higher in Maricopa, and 7% lower to 34% higher in Los Angeles counties. The time-of-day adjustment in San Francisco, limited to this region due to data availability, resulted in adjusted crash rates 2% lower to 16% higher than unadjusted rates, depending on severity level. The findings underscore the importance of adjusting for spatial and temporal confounders in benchmarking analysis, which ultimately contributes to a more equitable benchmark for ADS performance evaluations.
Abstract:The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance. In January 2024, a working group of experts across academia, insurance, and industry came together in Washington, DC to discuss the current and future challenges in performing such evaluations. A subset of this working group then met, virtually, on multiple occasions to produce this paper. This paper presents the RAVE (Retrospective Automated Vehicle Evaluation) checklist, a set of fifteen recommendations for performing and evaluating retrospective ADS performance comparisons. The recommendations are centered around the concepts of (1) quality and validity, (2) transparency, and (3) interpretation. Over time, it is anticipated there will be a large and varied body of work evaluating the observed performance of these ADS fleets. Establishing and promoting good scientific practices benefits the work of stakeholders, many of whom may not be subject matter experts. This working group's intentions are to: i) strengthen individual research studies and ii) make the at-large community more informed on how to evaluate this collective body of work.
Abstract:This paper examines the safety performance of the Waymo Driver, an SAE level 4 automated driving system (ADS) used in a rider-only (RO) ride-hailing application without a human driver, either in the vehicle or remotely. ADS crash data was derived from NHTSA's Standing General Order (SGO) reporting over 7.14 million RO miles through the end of October 2023 in Phoenix, AZ, San Francisco, CA, and Los Angeles, CA. This study is one of the first to compare overall crashed vehicle rates using only RO data (as opposed to ADS testing with a human behind the wheel) to a human benchmark that also corrects for biases caused by underreporting and unequal reporting thresholds reported in the literature. When considering all locations together, the any-injury-reported crashed vehicle rate was 0.41 incidents per million miles (IPMM) for the ADS vs 2.78 IPMM for the human benchmark, an 85% reduction or a 6.8 times lower rate. Police-reported crashed vehicle rates for all locations together were 2.1 IPMM for the ADS vs. 4.85 IPMM for the human benchmark, a 57% reduction or 2.3 times lower rate. Police-reported and any-injury-reported crashed vehicle rate reductions for the ADS were statistically significant when compared in San Francisco and Phoenix as well as combined across all locations. The comparison in Los Angeles, which to date has low mileage and no reported events, was not statistically significant. In general, the Waymo ADS had a lower any property damage or injury rate than the human benchmarks. Given imprecision in the benchmark estimate and multiple potential sources of underreporting biasing the benchmarks, caution should be taken when interpreting the results of the any property damage or injury comparison. Together, these crash-rate results should be interpreted as a directional and continuous confidence growth indicator, together with other methodologies, in a safety case approach.
Abstract:With fully automated driving systems (ADS; SAE level 4) ride-hailing services expanding in the US, we are now approaching an inflection point, where the process of retrospectively evaluating ADS safety impact can start to yield statistically credible conclusions. An ADS safety impact measurement requires a comparison to a "benchmark" crash rate. This study aims to address, update, and extend the existing literature by leveraging police-reported crashes to generate human crash rates for multiple geographic areas with current ADS deployments. All of the data leveraged is publicly accessible, and the benchmark determination methodology is intended to be repeatable and transparent. Generating a benchmark that is comparable to ADS crash data is associated with certain challenges, including data selection, handling underreporting and reporting thresholds, identifying the population of drivers and vehicles to compare against, choosing an appropriate severity level to assess, and matching crash and mileage exposure data. Consequently, we identify essential steps when generating benchmarks, and present our analyses amongst a backdrop of existing ADS benchmark literature. One analysis presented is the usage of established underreporting correction methodology to publicly available human driver police-reported data to improve comparability to publicly available ADS crash data. We also identify important dependencies in controlling for geographic region, road type, and vehicle type, and show how failing to control for these features can bias results. This body of work aims to contribute to the ability of the community - researchers, regulators, industry, and experts - to reach consensus on how to estimate accurate benchmarks.
Abstract:This study compares the safety of autonomous- and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are, as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage- and zip-code-calibrated Swiss Re (human driver) private passenger vehicle baselines. A liability claim is a request for compensation when someone is responsible for damage to property or injury to another person, typically following a collision. Liability claims reporting and their development is designed using insurance industry best practices to assess crash causation contribution and predict future crash contributions. In over 3.8 million miles driven without a human being behind the steering wheel in rider-only (RO) mode, the Waymo Driver incurred zero bodily injury claims in comparison with the human driver baseline of 1.11 claims per million miles (cpmm). The Waymo Driver also significantly reduced property damage claims to 0.78 cpmm in comparison with the human driver baseline of 3.26 cpmm. Similarly, in a more statistically robust dataset of over 35 million miles during autonomous testing operations (TO), the Waymo Driver, together with a human autonomous specialist behind the steering wheel monitoring the automation, also significantly reduced both bodily injury and property damage cpmm compared to the human driver baselines.
Abstract:This paper describes Waymo's Collision Avoidance Testing (CAT) methodology: a scenario-based testing method that evaluates the safety of the Waymo Driver Automated Driving Systems' (ADS) intended functionality in conflict situations initiated by other road users that require urgent evasive maneuvers. Because SAE Level 4 ADS are responsible for the dynamic driving task (DDT), when engaged, without immediate human intervention, evaluating a Level 4 ADS using scenario-based testing is difficult due to the potentially infinite number of operational scenarios in which hazardous situations may unfold. To that end, in this paper we first describe the safety test objectives for the CAT methodology, including the collision and serious injury metrics and the reference behavior model representing a non-impaired eyes on conflict human driver used to form an acceptance criterion. Afterward, we introduce the process for identifying potentially hazardous situations from a combination of human data, ADS testing data, and expert knowledge about the product design and associated Operational Design Domain (ODD). The test allocation and execution strategy is presented next, which exclusively utilize simulations constructed from sensor data collected on a test track, real-world driving, or from simulated sensor data. The paper concludes with the presentation of results from applying CAT to the fully autonomous ride-hailing service that Waymo operates in San Francisco, California and Phoenix, Arizona. The iterative nature of scenario identification, combined with over ten years of experience of on-road testing, results in a scenario database that converges to a representative set of responder role scenarios for a given ODD. Using Waymo's virtual test platform, which is calibrated to data collected as part of many years of ADS development, the CAT methodology provides a robust and scalable safety evaluation.
Abstract:This report presents Waymo's proposal for a systematic fatigue risk management framework that addresses prevention, monitoring, and mitigation of fatigue-induced risks during on-road testing of ADS technology. The proposed framework remains flexible to incorporate continuous improvements, and was informed by state of the art practices, research, learnings, and experience (both internal and external to Waymo). Fatigue is a recognized contributory factor in a substantial fraction of on-road crashes involving human drivers, and mitigation of fatigue-induced risks is still an open concern researched world-wide. While the proposed framework was specifically designed in relation to on-road testing of SAE Level 4 ADS technology, it has implications and applicability to lower levels of automation as well.
Abstract:Waymo's safety methodologies, which draw on well established engineering processes and address new safety challenges specific to Automated Vehicle technology, provide a firm foundation for safe deployment of Waymo's Level 4 ADS, which Waymo also refers to as the Waymo Driver. Waymo's determination of its readiness to deploy its AVs safely in different settings rests on that firm foundation and on a thorough analysis of risks specific to a particular Operational Design Domain. Waymo's process for making these readiness determinations entails an ordered examination of the relevant outputs from all of its safety methodologies combined with careful safety and engineering judgment focused on the specific facts relevant for a particular determination. Waymo will approve when it determines the ADS is ready for the new conditions without creating any unreasonable risks to safety. This paper explains Waymo's methodologies as applied to the three layers of its technology: hardware, ADS behavior, and operations, and also explains Waymo's safety governance. Waymo will continue to apply and adapt those methodologies, and to learn from the important contributions of others in the AV industry, as Waymo continues to build an ever safer and more able ADS.
Abstract:Waymo's mission to reduce traffic injuries and fatalities and improve mobility for all has led us to expand deployment of automated vehicles on public roads without a human driver behind the wheel. As part of this process, Waymo is committed to providing the public with informative and relevant data regarding the demonstrated safety of Waymo's automated driving system, which we call the Waymo Driver. The data presented in this paper represents more than 6.1 million miles of automated driving in the Phoenix, Arizona metropolitan area, including operations with a trained operator behind the steering wheel from calendar year 2019 and 65,000 miles of driverless operation without a human behind the steering wheel from 2019 and the first nine months of 2020. The paper includes every collision and minor contact experienced during these operations as well as every predicted contact identified using Waymo's counterfactual, what if, simulation of events had the vehicle's trained operator not disengaged automated driving. There were 47 contact events that occurred over this time period, consisting of 18 actual and 29 simulated contact events, none of which would be expected to result in severe or life threatening injuries. This paper presents the collision typology and severity for each actual and simulated event, along with diagrams depicting each of the most significant events. Nearly all the events involved one or more road rule violations or other errors by a human driver or road user, including all eight of the most severe events, which we define as involving actual or expected airbag deployment in any involved vehicle. When compared to national collision statistics, the Waymo Driver completely avoided certain collision modes that human driven vehicles are frequently involved in, including road departure and collisions with fixed objects.
Abstract:Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work and ask two new interesting questions. First, how much better can we classify driver gaze using head and eye pose versus just using head pose? Second, are there individual-specific gaze strategies that strongly correlate with how much gaze classification improves with the addition of eye pose information? We answer these questions by evaluating data drawn from an on-road study of 40 drivers. The main insight of the paper is conveyed through the analogy of an "owl" and "lizard" which describes the degree to which the eyes and the head move when shifting gaze. When the head moves a lot ("owl"), not much classification improvement is attained by estimating eye pose on top of head pose. On the other hand, when the head stays still and only the eyes move ("lizard"), classification accuracy increases significantly from adding in eye pose. We characterize how that accuracy varies between people, gaze strategies, and gaze regions.