Intelligent Transportation Thrust, Systems Hub, The Hong Kong University of Science and Technology
Abstract:Planning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a robust, adaptive framework to ensure safety in such complex scenarios. Existing approaches often struggle to provide reliable safety mechanisms in dynamic and learn multi-task behaviors from demonstrations in signal-free intersections. This study proposes a safety-critical planning method that integrates Dynamic High-Order Control Barrier Functions (DHOCBF) with a diffusion-based model, called Dynamic Safety-Critical Diffuser (DSC-Diffuser), offering a robust solution for adaptive, safe, and multi-task driving in signal-free intersections. Our approach incorporates a goal-oriented, task-guided diffusion model, enabling the model to learn multiple driving tasks simultaneously from real-world data. To further ensure driving safety in dynamic environments, the proposed DHOCBF framework dynamically adjusts to account for the movements of surrounding vehicles, offering enhanced adaptability compared to traditional control barrier functions. Validity evaluations of DHOCBF, conducted through numerical simulations, demonstrate its robustness in adapting to variations in obstacle velocities, sizes, uncertainties, and locations, effectively maintaining driving safety across a wide range of complex and uncertain scenarios. Performance evaluations across various scenes confirm that DSC-Diffuser provides realistic, stable, and generalizable policies, equipping it with the flexibility to adapt to diverse driving tasks.
Abstract:Navigating complex traffic environments has been significantly enhanced by advancements in intelligent technologies, enabling accurate environment perception and trajectory prediction for automated vehicles. However, existing research often neglects the consideration of the joint reasoning of scenario agents and lacks interpretability in trajectory prediction models, thereby limiting their practical application in real-world scenarios. To this purpose, an explainability-oriented trajectory prediction model is designed in this work, named Explainable Conditional Diffusion based Multimodal Trajectory Prediction Traj-Explainer, to retrieve the influencing factors of prediction and help understand the intrinsic mechanism of prediction. In Traj-Explainer, a modified conditional diffusion is well designed to capture the scenario multimodal trajectory pattern, and meanwhile, a modified Shapley Value model is assembled to rationally learn the importance of the global and scenario features. Numerical experiments are carried out by several trajectory prediction datasets, including Waymo, NGSIM, HighD, and MoCAD datasets. Furthermore, we evaluate the identified input factors which indicates that they are in agreement with the human driving experience, indicating the capability of the proposed model in appropriately learning the prediction. Code available in our open-source repository: \url{https://anonymous.4open.science/r/Interpretable-Prediction}.
Abstract:Diffusion models demonstrate superior performance in capturing complex distributions from large-scale datasets, providing a promising solution for quadrupedal locomotion control. However, offline policy is sensitive to Out-of-Distribution (OOD) states due to the limited state coverage in the datasets. In this work, we propose a two-stage learning framework combining offline learning and online preference alignment for legged locomotion control. Through the offline stage, the diffusion planner learns the joint distribution of state-action sequences from expert datasets without using reward labels. Subsequently, we perform the online interaction in the simulation environment based on the trained offline planer, which significantly addresses the OOD issues and improves the robustness. Specifically, we propose a novel weak preference labeling method without the ground-truth reward or human preferences. The proposed method exhibits superior stability and velocity tracking accuracy in pacing, trotting, and bounding gait under both slow- and high-speed scenarios and can perform zero-shot transfer to the real Unitree Go1 robots. The project website for this paper is at https://shangjaven.github.io/preference-aligned-diffusion-legged/.
Abstract:The on-board 3D object detection technology has received extensive attention as a critical technology for autonomous driving, while few studies have focused on applying roadside sensors in 3D traffic object detection. Existing studies achieve the projection of 2D image features to 3D features through height estimation based on the frustum. However, they did not consider the height alignment and the extraction efficiency of bird's-eye-view features. We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation. Extensive experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists. These results indicate that the algorithm is robust and generalized under various detection scenarios. Improving the accuracy of 3D object detection on the roadside is conducive to building a safe and trustworthy intelligent transportation system of vehicle-road coordination and promoting the large-scale application of autonomous driving. The code and pre-trained models will be released on https://anonymous.4open.science/r/HeightFormer.
Abstract:Developing efficient traffic models is essential for optimizing transportation systems, yet current approaches remain time-intensive and susceptible to human errors due to their reliance on manual processes. Traditional workflows involve exhaustive literature reviews, formula optimization, and iterative testing, leading to inefficiencies in research. In response, we introduce the Traffic Research Agent (TR-Agent), an AI-driven system designed to autonomously develop and refine traffic models through an iterative, closed-loop process. Specifically, we divide the research pipeline into four key stages: idea generation, theory formulation, theory evaluation, and iterative optimization; and construct TR-Agent with four corresponding modules: Idea Generator, Code Generator, Evaluator, and Analyzer. Working in synergy, these modules retrieve knowledge from external resources, generate novel ideas, implement and debug models, and finally assess them on the evaluation datasets. Furthermore, the system continuously refines these models based on iterative feedback, enhancing research efficiency and model performance. Experimental results demonstrate that TR-Agent achieves significant performance improvements across multiple traffic models, including the Intelligent Driver Model (IDM) for car following, the MOBIL lane-changing model, and the Lighthill-Whitham-Richards (LWR) traffic flow model. Additionally, TR-Agent provides detailed explanations for its optimizations, allowing researchers to verify and build upon its improvements easily. This flexibility makes the framework a powerful tool for researchers in transportation and beyond. To further support research and collaboration, we have open-sourced both the code and data used in our experiments, facilitating broader access and enabling continued advancements in the field.
Abstract:Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learning to extract styles from driving data. These approaches, though effective in some cases, face challenges: difficulty obtaining specific driving data for style matching (e.g., in Robotaxis), inability to align driving style metrics with user preferences, and limitations to pre-existing styles, restricting customization and generalization to new commands. This paper introduces Words2Wheels, a framework that automatically generates customized driving policies based on natural language user commands. Words2Wheels employs a Style-Customized Reward Function to generate a Style-Customized Driving Policy without relying on prior driving data. By leveraging large language models and a Driving Style Database, the framework efficiently retrieves, adapts, and generalizes driving styles. A Statistical Evaluation module ensures alignment with user preferences. Experimental results demonstrate that Words2Wheels outperforms existing methods in accuracy, generalization, and adaptability, offering a novel solution for customized AV driving behavior. Code and demo available at https://yokhon.github.io/Words2Wheels/.
Abstract:To alleviate energy shortages and environmental impacts caused by transportation, this study introduces EcoFollower, a novel eco-car-following model developed using reinforcement learning (RL) to optimize fuel consumption in car-following scenarios. Employing the NGSIM datasets, the performance of EcoFollower was assessed in comparison with the well-established Intelligent Driver Model (IDM). The findings demonstrate that EcoFollower excels in simulating realistic driving behaviors, maintaining smooth vehicle operations, and closely matching the ground truth metrics of time-to-collision (TTC), headway, and comfort. Notably, the model achieved a significant reduction in fuel consumption, lowering it by 10.42\% compared to actual driving scenarios. These results underscore the capability of RL-based models like EcoFollower to enhance autonomous vehicle algorithms, promoting safer and more energy-efficient driving strategies.
Abstract:The continual evolution of autonomous driving technology requires car-following models that can adapt to diverse and dynamic traffic environments. Traditional learning-based models often suffer from performance degradation when encountering unseen traffic patterns due to a lack of continual learning capabilities. This paper proposes a novel car-following model based on continual learning that addresses this limitation. Our framework incorporates Elastic Weight Consolidation (EWC) and Memory Aware Synapses (MAS) techniques to mitigate catastrophic forgetting and enable the model to learn incrementally from new traffic data streams. We evaluate the performance of the proposed model on the Waymo and Lyft datasets which encompass various traffic scenarios. The results demonstrate that the continual learning techniques significantly outperform the baseline model, achieving 0\% collision rates across all traffic conditions. This research contributes to the advancement of autonomous driving technology by fostering the development of more robust and adaptable car-following models.
Abstract:Accurate modeling of car-following behaviors is essential for various applications in traffic management and autonomous driving systems. However, current approaches often suffer from limitations like high sensitivity to data quality and lack of interpretability. In this study, we propose GenFollower, a novel zero-shot prompting approach that leverages large language models (LLMs) to address these challenges. We reframe car-following behavior as a language modeling problem and integrate heterogeneous inputs into structured prompts for LLMs. This approach achieves improved prediction performance and interpretability compared to traditional baseline models. Experiments on the Waymo Open datasets demonstrate GenFollower's superior performance and ability to provide interpretable insights into factors influencing car-following behavior. This work contributes to advancing the understanding and prediction of car-following behaviors, paving the way for enhanced traffic management and autonomous driving systems.
Abstract:Connected and Automated Vehicles (CAVs) offer a promising solution to the challenges of mixed traffic with both CAVs and Human-Driven Vehicles (HDVs). A significant hurdle in such scenarios is traffic oscillation, or the "stop-and-go" pattern, during car-following situations. While HDVs rely on limited information, CAVs can leverage data from other CAVs for better decision-making. This allows CAVs to anticipate and mitigate the spread of deceleration waves that worsen traffic flow. We propose a novel "CAV-AHDV-CAV" car-following framework that treats the sequence of HDVs between two CAVs as a single entity, eliminating noise from individual driver behaviors. This deep reinforcement learning approach analyzes vehicle equilibrium states and employs a state fusion strategy. Trained and tested on diverse datasets (HighD, NGSIM, SPMD, Waymo, Lyft) encompassing over 70,000 car-following instances, our model outperforms baselines in collision avoidance, maintaining equilibrium with both preceding and leading vehicles and achieving the lowest standard deviation of time headway. These results demonstrate the effectiveness of our approach in developing robust CAV control strategies for mixed traffic. Our model has the potential to mitigate traffic oscillation, improve traffic flow efficiency, and enhance overall safety.