The increasing complexity of robots and autonomous agents that interact with people highlights the critical need for approaches that systematically test them before deployment. This review paper presents a general framework for solving this problem, describes the insights that we have gained from working on each component of the framework, and shows how integrating these components leads to the discovery of a diverse range of realistic and challenging scenarios that reveal previously unknown failures in deployed robotic systems interacting with people.