Abstract:Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered as individual modules or combined into a multi-task learning paradigm with a shared backbone but separate task heads. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer from complicated input representations and redundant framework designs. More importantly, they can not make long-term predictions about future driving scenarios. To address these issues, we rethink the necessity of each module in an autonomous driving task and incorporate only the required modules into a minimalist autonomous driving framework. We propose BEVGPT, a generative pre-trained large model that integrates driving scenario prediction, decision-making, and motion planning. The model takes the bird's-eye-view (BEV) images as the only input source and makes driving decisions based on surrounding traffic scenarios. To ensure driving trajectory feasibility and smoothness, we develop an optimization-based motion planning method. We instantiate BEVGPT on Lyft Level 5 Dataset and use Woven Planet L5Kit for realistic driving simulation. The effectiveness and robustness of the proposed framework are verified by the fact that it outperforms previous methods in 100% decision-making metrics and 66% motion planning metrics. Furthermore, the ability of our framework to accurately generate BEV images over the long term is demonstrated through the task of driving scenario prediction. To the best of our knowledge, this is the first generative pre-trained large model for autonomous driving prediction, decision-making, and motion planning with only BEV images as input.
Abstract:Model-based methods provide an effective approach to offline reinforcement learning (RL). They learn an environmental dynamics model from interaction experiences and then perform policy optimization based on the learned model. However, previous model-based offline RL methods lack long-term prediction capability, resulting in large errors when generating multi-step trajectories. We address this issue by developing a sequence modeling architecture, Environment Transformer, which can generate reliable long-horizon trajectories based on offline datasets. We then propose a novel model-based offline RL algorithm, ENTROPY, that learns the dynamics model and reward function by ENvironment TRansformer and performs Offline PolicY optimization. We evaluate the proposed method on MuJoCo continuous control RL environments. Results show that ENTROPY performs comparably or better than the state-of-the-art model-based and model-free offline RL methods and demonstrates more powerful long-term trajectory prediction capability compared to existing model-based offline methods.
Abstract:Catching high-speed targets in the flight is a complex and typical highly dynamic task. In this paper, we propose Catch Planner, a planning-with-decision scheme for catching. For sequential decision making, we propose a policy search method based on deep reinforcement learning. In order to make catching adaptive and flexible, we propose a trajectory optimization method to jointly optimize the highly coupled catching time and terminal state while considering the dynamic feasibility and safety. We also propose a flexible constraint transcription method to catch targets at any reasonable attitude and terminal position bias. The proposed Catch Planner provides a new paradigm for the combination of learning and planning and is integrated on the quadrotor designed by ourselves, which runs at 100$hz$ on the onboard computer. Extensive experiments are carried out in real and simulated scenes to verify the robustness of the proposed method and its expansibility when facing a variety of high-speed flying targets.