Abstract:Dynamic and interactive traffic scenarios pose significant challenges for autonomous driving systems. Reinforcement learning (RL) offers a promising approach by enabling the exploration of driving policies beyond the constraints of pre-collected datasets and predefined conditions, particularly in complex environments. However, a critical challenge lies in effectively extracting spatial and temporal features from sequences of high-dimensional, multi-modal observations while minimizing the accumulation of errors over time. Additionally, efficiently guiding large-scale RL models to converge on optimal driving policies without frequent failures during the training process remains tricky. We propose an end-to-end model-based RL algorithm named Ramble to address these issues. Ramble processes multi-view RGB images and LiDAR point clouds into low-dimensional latent features to capture the context of traffic scenarios at each time step. A transformer-based architecture is then employed to model temporal dependencies and predict future states. By learning a dynamics model of the environment, Ramble can foresee upcoming traffic events and make more informed, strategic decisions. Our implementation demonstrates that prior experience in feature extraction and decision-making plays a pivotal role in accelerating the convergence of RL models toward optimal driving policies. Ramble achieves state-of-the-art performance regarding route completion rate and driving score on the CARLA Leaderboard 2.0, showcasing its effectiveness in managing complex and dynamic traffic situations.
Abstract:Path planning plays a pivotal role in automated parking, yet current methods struggle to efficiently handle the intricate and diverse parking scenarios. One potential solution is the reinforcement learning-based method, leveraging its exploration in unrecorded situations. However, a key challenge lies in training reinforcement learning methods is the inherent randomness in converging to a feasible policy. This paper introduces a novel solution, the Hybrid POlicy Path plannEr (HOPE), which integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. The paper presents a method to calculate and implement an action mask mechanism in path planning, significantly boosting the efficiency and effectiveness of reinforcement learning training. A transformer is employed as the network structure to fuse environmental information and generate planned paths. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showcasing higher planning success rates and generalization across various scenarios. The code for our solution will be openly available on \href{GitHub}{https://github.com/jiamiya/HOPE}. % after the paper's acceptance.
Abstract:Tactics2D is an open-source multi-agent reinforcement learning library with a Python backend. Its goal is to provide a convenient toolset for researchers to develop decision-making algorithms for autonomous driving. The library includes diverse traffic scenarios implemented as gym-based environments equipped with multi-sensory capabilities and violation detection for traffic rules. Additionally, it features a reinforcement learning baseline tested with reasonable evaluation metrics. Tactics2D is highly modular and customizable. The source code of Tactics2D is available at https://github.com/WoodOxen/Tactics2D.
Abstract:Recently, 3D understanding research pays more attention to extracting the feature from point cloud directly. Therefore, exploring shape pattern description in points is essential. Inspired by SIFT that is an outstanding 2D shape representation, we design a PointSIFT module that encodes information of different orientations and is adaptive to scale of shape. Especially, an orientation-encoding unit is designed to describe eight crucial orientations. Thus, by stacking several orientation-encoding units, we can get the multi-scale representation. Extensive experiments show our PointS IF T-based framework outperforms state-of-the-art method on standard benchmarking datasets. The code and trained model will be published accompanied by this paper.