This paper focuses on safety performance testing and characterization of black-box highly automated vehicles (HAV). Existing testing approaches typically obtain the testing outcomes by deploying the HAV into a specific testing environment. Such a testing environment can involve various passively given testing strategies presented by other traffic participants such as (i) the naturalistic driving policy learned from human drivers, (ii) extracted concrete scenarios from real-world driving data, and (iii) model-based or data-driven adversarial testing methodologies focusing on forcing safety-critical events. The safety performance of HAV is further characterized by analyzing the obtained testing outcomes with a particular selected measure, such as the observed collision risk. The aforementioned testing practices suffer from the scarcity of safety-critical events, have limited operational design domain (ODD) coverage, or are biased toward long-tail unsafe cases. This paper presents a novel and informative testing strategy that differs from these existing practices. The proposal is inspired by the intuition that a relatively safer HAV driving policy would allow the traffic vehicles to exhibit a higher level of aggressiveness to achieve a certain fixed level of an overall safe outcome. One can specifically characterize such a HAV and traffic interactive strategy and use it as a safety performance indicator for the HAV. Under the proposed testing scheme, the HAV is evaluated under its full ODD with a reward function that represents a trade-off between safety and adversity in generating safety-critical events. The proposed methodology is demonstrated in simulation with various HAV designs under different operational design domains.