Abstract:The MUSICC project has created a proof-of-concept scenario database to be used as part of a type approval process for the verification of automated driving systems (ADS). This process must include a highly automated means of evaluating test results, as manual review at the scale required is impractical. This paper sets out a framework for assessing an ADS's behavioural safety in normal operation (i.e. performance of the dynamic driving task without component failures or malicious actions). Five top-level evaluation criteria for ADS performance are identified. Implementing these requires two types of outcome scoring rule: prescriptive (measurable rules which must always be followed) and risk-based (undesirable outcomes which must not occur too often). Scoring rules are defined in a programming language and will be stored as part of the scenario description. Risk-based rules cannot give a pass/fail decision from a single test case. Instead, a framework is defined to reach a decision for each functional scenario (set of test cases with common features). This considers statistical performance across many individual tests. Implications of this framework for hypothesis testing and scenario selection are identified.