Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kun Ren

(Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Feb 27, 2023

Haoyi Niu, Kun Ren, Yizhou Xu, Ziyuan Yang, Yichen Lin, Yi Zhang, Jianming Hu

Figure 1 for (Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Figure 2 for (Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Figure 3 for (Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Figure 4 for (Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Abstract:Autonomous driving and its widespread adoption have long held tremendous promise. Nevertheless, without a trustworthy and thorough testing procedure, not only does the industry struggle to mass-produce autonomous vehicles (AV), but neither the general public nor policymakers are convinced to accept the innovations. Generating safety-critical scenarios that present significant challenges to AV is an essential first step in testing. Real-world datasets include naturalistic but overly safe driving behaviors, whereas simulation would allow for unrestricted exploration of diverse and aggressive traffic scenarios. Conversely, higher-dimensional searching space in simulation disables efficient scenario generation without real-world data distribution as implicit constraints. In order to marry the benefits of both, it seems appealing to learn to generate scenarios from both offline real-world and online simulation data simultaneously. Therefore, we tailor a Reversely Regularized Hybrid Offline-and-Online ((Re)$^2$H2O) Reinforcement Learning recipe to additionally penalize Q-values on real-world data and reward Q-values on simulated data, which ensures the generated scenarios are both varied and adversarial. Through extensive experiments, our solution proves to produce more risky scenarios than competitive baselines and it can generalize to work with various autonomous driving models. In addition, these generated scenarios are also corroborated to be capable of fine-tuning AV performance.

Via

Access Paper or Ask Questions