Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lorenzo Strigini

On the Need for a Statistical Foundation in Scenario-Based Testing of Autonomous Vehicles

May 04, 2025

Xingyu Zhao, Robab Aghazadeh-Chakherlou, Chih-Hong Cheng, Peter Popov, Lorenzo Strigini

Abstract:Scenario-based testing has emerged as a common method for autonomous vehicles (AVs) safety, offering a more efficient alternative to mile-based testing by focusing on high-risk scenarios. However, fundamental questions persist regarding its stopping rules, residual risk estimation, debug effectiveness, and the impact of simulation fidelity on safety claims. This paper argues that a rigorous statistical foundation is essential to address these challenges and enable rigorous safety assurance. By drawing parallels between AV testing and traditional software testing methodologies, we identify shared research gaps and reusable solutions. We propose proof-of-concept models to quantify the probability of failure per scenario (pfs) and evaluate testing effectiveness under varying conditions. Our analysis reveals that neither scenario-based nor mile-based testing universally outperforms the other. Furthermore, we introduce Risk Estimation Fidelity (REF), a novel metric to certify the alignment of synthetic and real-world testing outcomes, ensuring simulation-based safety claims are statistically defensible.

* under review

Via

Access Paper or Ask Questions

Bootstrapping confidence in future safety based on past safe operation

Oct 20, 2021

Peter Bishop, Andrey Povyakalo, Lorenzo Strigini

Figure 1 for Bootstrapping confidence in future safety based on past safe operation

Figure 2 for Bootstrapping confidence in future safety based on past safe operation

Figure 3 for Bootstrapping confidence in future safety based on past safe operation

Figure 4 for Bootstrapping confidence in future safety based on past safe operation

Abstract:With autonomous vehicles (AVs), a major concern is the inability to give meaningful quantitative assurance of safety, to the extent required by society - e.g. that an AV must be at least as safe as a good human driver - before that AV is in extensive use. We demonstrate an approach to achieving more moderate, but useful, confidence, e.g., confidence of low enough probability of causing accidents in the early phases of operation. This formalises mathematically the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one's confidence in its safety and allow progressively more extensive operation: a process of "bootstrapping" of confidence. Translating that intuitive approach into theorems shows: (1) that it is substantially sound in the right circumstances, and could be a good method for deciding about the early deployment phase for an AV; (2) how much confidence can be rightly derived from such a "cautious deployment" approach, so that we can avoid over-optimism; (3) under which conditions our sound formulas for future confidence are applicable; (4) thus, which analyses of the concrete situations, and/or constraints on practice, are needed in order to enjoy the advantages of provably correct confidence in adequate future safety.

* 15 pages, 3 figures

Via

Access Paper or Ask Questions

HEDP: A Method for Early Forecasting Software Defects based on Human Error Mechanisms

Oct 13, 2021

Fuqun Huang, Lorenzo Strigini

Figure 1 for HEDP: A Method for Early Forecasting Software Defects based on Human Error Mechanisms

Figure 2 for HEDP: A Method for Early Forecasting Software Defects based on Human Error Mechanisms

Figure 3 for HEDP: A Method for Early Forecasting Software Defects based on Human Error Mechanisms

Figure 4 for HEDP: A Method for Early Forecasting Software Defects based on Human Error Mechanisms

Abstract:As the primary cause of software defects, human error is the key to understanding, and perhaps to predicting and avoiding them. Little research has been done to predict defects on the basis of the cognitive errors that cause them. This paper proposes an approach to predicting software defects through knowledge about the cognitive mechanisms of human errors. Our theory is that the main process behind a software defect is that an error-prone scenario triggers human error modes, which psychologists have observed to recur across diverse activities. Software defects can then be predicted by identifying such scenarios, guided by this knowledge of typical error modes. The proposed idea emphasizes predicting the exact location and form of a possible defect. We conducted two case studies to demonstrate and validate this approach, with 55 programmers in a programming competition and 5 analysts serving as the users of the approach. We found it impressive that the approach was able to predict, at the requirement phase, the exact locations and forms of 7 out of the 22 (31.8%) specific types of defects that were found in the code. The defects predicted tended to be common defects: their occurrences constituted 75.7% of the total number of defects in the 55 developed programs; each of them was introduced by at least two persons. The fraction of the defects introduced by a programmer that were predicted was on average (over all programmers) 75%. Furthermore, these predicted defects were highly persistent through the debugging process. If the prediction had been used to successfully prevent these defects, this could have saved 46.2% of the debugging iterations. This excellent capability of forecasting the exact locations and forms of possible defects at the early phases of software development recommends the approach for substantial benefits to defect prevention and early detection.

* 30 pages, 5 figures, and 17 tables

Via

Access Paper or Ask Questions

Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Aug 19, 2020

Xingyu Zhao, Kizito Salako, Lorenzo Strigini, Valentin Robu, David Flynn

Figure 1 for Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Figure 2 for Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Figure 3 for Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Figure 4 for Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Abstract:Context: Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem. Diverse evidence needs to be combined in a rigorous way: in particular, results of operational testing with other evidence from design and verification. Growing use of machine learning in SCSs, by precluding most established methods for gaining assurance, makes operational testing even more important for supporting safety and reliability claims. Objective: We use Autonomous Vehicles (AVs) as a current example to revisit the problem of demonstrating high reliability. AVs are making their debut on public roads: methods for assessing whether an AV is safe enough are urgently needed. We demonstrate how to answer 5 questions that would arise in assessing an AV type, starting with those proposed by a highly-cited study. Method: We apply new theorems extending Conservative Bayesian Inference (CBI), which exploit the rigour of Bayesian methods while reducing the risk of involuntary misuse associated with now-common applications of Bayesian inference; we define additional conditions needed for applying these methods to AVs. Results: Prior knowledge can bring substantial advantages if the AV design allows strong expectations of safety before road testing. We also show how naive attempts at conservative assessment may lead to over-optimism instead; why extrapolating the trend of disengagements is not suitable for safety claims; use of knowledge that an AV has moved to a less stressful environment. Conclusion: While some reliability targets will remain too high to be practically verifiable, CBI removes a major source of doubt: it allows use of prior knowledge without inducing dangerously optimistic biases. For certain ranges of required reliability and prior beliefs, CBI thus supports feasible, sound arguments. Useful conservative claims can be derived from limited prior knowledge.

* Accepted by Information and Software Technology. arXiv admin note: substantial text overlap with arXiv:1908.06540

Via

Access Paper or Ask Questions

Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing

Aug 19, 2019

Xingyu Zhao, Valentin Robu, David Flynn, Kizito Salako, Lorenzo Strigini

Figure 1 for Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing

Figure 2 for Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing

Figure 3 for Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing

Figure 4 for Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing

Abstract:There is an urgent societal need to assess whether autonomous vehicles (AVs) are safe enough. From published quantitative safety and reliability assessments of AVs, we know that, given the goal of predicting very low rates of accidents, road testing alone requires infeasible numbers of miles to be driven. However, previous analyses do not consider any knowledge prior to road testing - knowledge which could bring substantial advantages if the AV design allows strong expectations of safety before road testing. We present the advantages of a new variant of Conservative Bayesian Inference (CBI), which uses prior knowledge while avoiding optimistic biases. We then study the trend of disengagements (take-overs by human drivers) by applying Software Reliability Growth Models (SRGMs) to data from Waymo's public road testing over 51 months, in view of the practice of software updates during this testing. Our approach is to not trust any specific SRGM, but to assess forecast accuracy and then improve forecasts. We show that, coupled with accuracy assessment and recalibration techniques, SRGMs could be a valuable test planning aid.

* Proceedings of 30th IEEE International Symposium on Software Reliability Engineering (ISSRE 2019)

Via

Access Paper or Ask Questions