Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meixin Zhu

Intelligent Transportation Thrust, Systems Hub, The Hong Kong University of Science and Technology

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

May 06, 2025

Xinyi Yuan, Zhiwei Shang, Wenjun Huang, Yunduan Cui, Di Chen, Meixin Zhu

Abstract:Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

High-resolution on-road air pollution exposure informed by taxi-based mobile monitoring sensors

Dec 13, 2024

Hui Zhong, Xinhu Zheng, Ting Gan, Yonghong Liu, Meixin Zhu

Abstract:Air pollutant exposure exhibits significant spatial and temporal variability, with localized hotspots, particularly in traffic microenvironments, posing health risks to commuters. Although widely used for air quality assessment, fixed-site monitoring stations are limited by sparse distribution, high costs, and maintenance needs, making them less effective in capturing on-road pollution levels. This study utilizes a fleet of 314 taxis equipped with sensors to measure NO\textsubscript{2}, PM\textsubscript{2.5}, and PM\textsubscript{10} concentrations and identify high-exposure hotspots. The findings reveal disparities between mobile and stationary measurements, map the spatiotemporal exposure patterns, and highlight local hotspots. These results demonstrate the potential of mobile monitoring to provide fine-scale, on-road air pollution assessments, offering valuable insights for policymakers to design targeted interventions and protect public health, particularly for sensitive populations.

Via

Access Paper or Ask Questions

Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections

Nov 29, 2024

Di Chen, Ruiguo Zhong, Kehua Chen, Zhiwei Shang, Meixin Zhu, Edward Chung

Figure 1 for Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections

Figure 2 for Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections

Figure 3 for Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections

Figure 4 for Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections

Abstract:Planning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a robust, adaptive framework to ensure safety in such complex scenarios. Existing approaches often struggle to provide reliable safety mechanisms in dynamic and learn multi-task behaviors from demonstrations in signal-free intersections. This study proposes a safety-critical planning method that integrates Dynamic High-Order Control Barrier Functions (DHOCBF) with a diffusion-based model, called Dynamic Safety-Critical Diffuser (DSC-Diffuser), offering a robust solution for adaptive, safe, and multi-task driving in signal-free intersections. Our approach incorporates a goal-oriented, task-guided diffusion model, enabling the model to learn multiple driving tasks simultaneously from real-world data. To further ensure driving safety in dynamic environments, the proposed DHOCBF framework dynamically adjusts to account for the movements of surrounding vehicles, offering enhanced adaptability compared to traditional control barrier functions. Validity evaluations of DHOCBF, conducted through numerical simulations, demonstrate its robustness in adapting to variations in obstacle velocities, sizes, uncertainties, and locations, effectively maintaining driving safety across a wide range of complex and uncertain scenarios. Performance evaluations across various scenes confirm that DSC-Diffuser provides realistic, stable, and generalizable policies, equipping it with the flexibility to adapt to diverse driving tasks.

* 7 figures, 3 tables, 12 pages

Via

Access Paper or Ask Questions

Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach

Oct 22, 2024

Pei Liu, Haipeng Liu, Yiqun Li, Tianyu Shi, Meixin Zhu, Ziyuan Pu

Figure 1 for Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach

Figure 2 for Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach

Figure 3 for Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach

Figure 4 for Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach

Abstract:Navigating complex traffic environments has been significantly enhanced by advancements in intelligent technologies, enabling accurate environment perception and trajectory prediction for automated vehicles. However, existing research often neglects the consideration of the joint reasoning of scenario agents and lacks interpretability in trajectory prediction models, thereby limiting their practical application in real-world scenarios. To this purpose, an explainability-oriented trajectory prediction model is designed in this work, named Explainable Conditional Diffusion based Multimodal Trajectory Prediction Traj-Explainer, to retrieve the influencing factors of prediction and help understand the intrinsic mechanism of prediction. In Traj-Explainer, a modified conditional diffusion is well designed to capture the scenario multimodal trajectory pattern, and meanwhile, a modified Shapley Value model is assembled to rationally learn the importance of the global and scenario features. Numerical experiments are carried out by several trajectory prediction datasets, including Waymo, NGSIM, HighD, and MoCAD datasets. Furthermore, we evaluate the identified input factors which indicates that they are in agreement with the human driving experience, indicating the capability of the proposed model in appropriately learning the prediction. Code available in our open-source repository: \url{https://anonymous.4open.science/r/Interpretable-Prediction}.

Via

Access Paper or Ask Questions

Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

Oct 17, 2024

Xinyi Yuan, Zhiwei Shang, Zifan Wang, Chenkai Wang, Zhao Shan, Zhenchao Qi, Meixin Zhu, Chenjia Bai, Xuelong Li

Figure 1 for Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

Figure 2 for Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

Figure 3 for Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

Figure 4 for Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

Abstract:Diffusion models demonstrate superior performance in capturing complex distributions from large-scale datasets, providing a promising solution for quadrupedal locomotion control. However, offline policy is sensitive to Out-of-Distribution (OOD) states due to the limited state coverage in the datasets. In this work, we propose a two-stage learning framework combining offline learning and online preference alignment for legged locomotion control. Through the offline stage, the diffusion planner learns the joint distribution of state-action sequences from expert datasets without using reward labels. Subsequently, we perform the online interaction in the simulation environment based on the trained offline planer, which significantly addresses the OOD issues and improves the robustness. Specifically, we propose a novel weak preference labeling method without the ground-truth reward or human preferences. The proposed method exhibits superior stability and velocity tracking accuracy in pacing, trotting, and bounding gait under both slow- and high-speed scenarios and can perform zero-shot transfer to the real Unitree Go1 robots. The project website for this paper is at https://shangjaven.github.io/preference-aligned-diffusion-legged/.

Via

Access Paper or Ask Questions

HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective

Oct 10, 2024

Pei Liu, Zihao Zhang, Haipeng Liu, Nanfang Zheng, Meixin Zhu, Ziyuan Pu

Figure 1 for HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective

Figure 2 for HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective

Figure 3 for HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective

Figure 4 for HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective

Abstract:The on-board 3D object detection technology has received extensive attention as a critical technology for autonomous driving, while few studies have focused on applying roadside sensors in 3D traffic object detection. Existing studies achieve the projection of 2D image features to 3D features through height estimation based on the frustum. However, they did not consider the height alignment and the extraction efficiency of bird's-eye-view features. We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation. Extensive experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists. These results indicate that the algorithm is robust and generalized under various detection scenarios. Improving the accuracy of 3D object detection on the roadside is conducive to building a safe and trustworthy intelligent transportation system of vehicle-road coordination and promoting the large-scale application of autonomous driving. The code and pre-trained models will be released on https://anonymous.4open.science/r/HeightFormer.

Via

Access Paper or Ask Questions

Automating Traffic Model Enhancement with AI Research Agent

Sep 25, 2024

Xusen Guo, Xinxi Yang, Mingxing Peng, Hongliang Lu, Meixin Zhu, Hai Yang

Figure 1 for Automating Traffic Model Enhancement with AI Research Agent

Figure 2 for Automating Traffic Model Enhancement with AI Research Agent

Figure 3 for Automating Traffic Model Enhancement with AI Research Agent

Figure 4 for Automating Traffic Model Enhancement with AI Research Agent

Abstract:Developing efficient traffic models is essential for optimizing transportation systems, yet current approaches remain time-intensive and susceptible to human errors due to their reliance on manual processes. Traditional workflows involve exhaustive literature reviews, formula optimization, and iterative testing, leading to inefficiencies in research. In response, we introduce the Traffic Research Agent (TR-Agent), an AI-driven system designed to autonomously develop and refine traffic models through an iterative, closed-loop process. Specifically, we divide the research pipeline into four key stages: idea generation, theory formulation, theory evaluation, and iterative optimization; and construct TR-Agent with four corresponding modules: Idea Generator, Code Generator, Evaluator, and Analyzer. Working in synergy, these modules retrieve knowledge from external resources, generate novel ideas, implement and debug models, and finally assess them on the evaluation datasets. Furthermore, the system continuously refines these models based on iterative feedback, enhancing research efficiency and model performance. Experimental results demonstrate that TR-Agent achieves significant performance improvements across multiple traffic models, including the Intelligent Driver Model (IDM) for car following, the MOBIL lane-changing model, and the Lighthill-Whitham-Richards (LWR) traffic flow model. Additionally, TR-Agent provides detailed explanations for its optimizations, allowing researchers to verify and build upon its improvements easily. This flexibility makes the framework a powerful tool for researchers in transportation and beyond. To further support research and collaboration, we have open-sourced both the code and data used in our experiments, facilitating broader access and enabling continued advancements in the field.

* 19 pages, 10 figures

Via

Access Paper or Ask Questions

From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Sep 18, 2024

Xu Han, Xianda Chen, Zhenghan Cai, Pinlong Cai, Meixin Zhu, Xiaowen Chu

Figure 1 for From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Figure 2 for From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Figure 3 for From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Figure 4 for From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Abstract:Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learning to extract styles from driving data. These approaches, though effective in some cases, face challenges: difficulty obtaining specific driving data for style matching (e.g., in Robotaxis), inability to align driving style metrics with user preferences, and limitations to pre-existing styles, restricting customization and generalization to new commands. This paper introduces Words2Wheels, a framework that automatically generates customized driving policies based on natural language user commands. Words2Wheels employs a Style-Customized Reward Function to generate a Style-Customized Driving Policy without relying on prior driving data. By leveraging large language models and a Driving Style Database, the framework efficiently retrieves, adapts, and generalizes driving styles. A Statistical Evaluation module ensures alignment with user preferences. Experimental results demonstrate that Words2Wheels outperforms existing methods in accuracy, generalization, and adaptability, offering a novel solution for customized AV driving behavior. Code and demo available at https://yokhon.github.io/Words2Wheels/.

* 6 pages, 7 figures

Via

Access Paper or Ask Questions

EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

Jul 22, 2024

Hui Zhong, Xianda Chen, PakHin Tiu, Hongliang Lu, Meixin Zhu

Figure 1 for EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

Figure 2 for EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

Figure 3 for EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

Figure 4 for EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

Abstract:To alleviate energy shortages and environmental impacts caused by transportation, this study introduces EcoFollower, a novel eco-car-following model developed using reinforcement learning (RL) to optimize fuel consumption in car-following scenarios. Employing the NGSIM datasets, the performance of EcoFollower was assessed in comparison with the well-established Intelligent Driver Model (IDM). The findings demonstrate that EcoFollower excels in simulating realistic driving behaviors, maintaining smooth vehicle operations, and closely matching the ground truth metrics of time-to-collision (TTC), headway, and comfort. Notably, the model achieved a significant reduction in fuel consumption, lowering it by 10.42\% compared to actual driving scenarios. These results underscore the capability of RL-based models like EcoFollower to enhance autonomous vehicle algorithms, promoting safer and more energy-efficient driving strategies.

Via

Access Paper or Ask Questions

Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

Jul 17, 2024

Xianda Chen, PakHin Tiu, Xu Han, Junjie Chen, Yuanfei Wu, Xinhu Zheng, Meixin Zhu

Figure 1 for Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

Figure 2 for Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

Figure 3 for Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

Figure 4 for Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

Abstract:The continual evolution of autonomous driving technology requires car-following models that can adapt to diverse and dynamic traffic environments. Traditional learning-based models often suffer from performance degradation when encountering unseen traffic patterns due to a lack of continual learning capabilities. This paper proposes a novel car-following model based on continual learning that addresses this limitation. Our framework incorporates Elastic Weight Consolidation (EWC) and Memory Aware Synapses (MAS) techniques to mitigate catastrophic forgetting and enable the model to learn incrementally from new traffic data streams. We evaluate the performance of the proposed model on the Waymo and Lyft datasets which encompass various traffic scenarios. The results demonstrate that the continual learning techniques significantly outperform the baseline model, achieving 0\% collision rates across all traffic conditions. This research contributes to the advancement of autonomous driving technology by fostering the development of more robust and adaptable car-following models.

Via

Access Paper or Ask Questions