Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenshuo Wang

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Apr 03, 2025

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang(+2 more)

Abstract:Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.

Via

Access Paper or Ask Questions

MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Aug 07, 2024

Xiangcheng Hu, Jin Wu, Jianhao Jiao, Binqian Jiang, Wei Zhang, Wenshuo Wang, Ping Tan

Figure 1 for MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Figure 2 for MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Figure 3 for MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Figure 4 for MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Abstract:Large-scale multi-session LiDAR mapping is essential for a wide range of applications, including surveying, autonomous driving, crowdsourced mapping, and multi-agent navigation. However, existing approaches often struggle with data redundancy, robustness, and accuracy in complex environments. To address these challenges, we present MS-Mapping, an novel multi-session LiDAR mapping system that employs an incremental mapping scheme for robust and accurate map assembly in large-scale environments. Our approach introduces three key innovations: 1) A distribution-aware keyframe selection method that captures the subtle contributions of each point cloud frame to the map by analyzing the similarity of map distributions. This method effectively reduces data redundancy and pose graph size, while enhancing graph optimization speed; 2) An uncertainty model that automatically performs least-squares adjustments according to the covariance matrix during graph optimization, improving mapping precision, robustness, and flexibility without the need for scene-specific parameter tuning. This uncertainty model enables our system to monitor pose uncertainty and avoid ill-posed optimizations, thereby increasing adaptability to diverse and challenging environments. 3) To ensure fair evaluation, we redesign baseline comparisons and the evaluation benchmark. Direct assessment of map accuracy demonstrates the superiority of the proposed MS-Mapping algorithm compared to state-of-the-art methods. In addition to employing public datasets such as Urban-Nav, FusionPortable, and Newer College, we conducted extensive experiments on such a large \SI{855}{m}$\times$\SI{636}{m} ground truth map, collecting over \SI{20}{km} of indoor and outdoor data across more than ten sequences...

* 18 pages, 22 figures

Via

Access Paper or Ask Questions

100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems

Jun 12, 2024

Chaopeng Zhang, Wenshuo Wang, Zhaokun Chen, Junqiang Xi

Abstract:Effective driving style analysis is critical to developing human-centered intelligent driving systems that consider drivers' preferences. However, the approaches and conclusions of most related studies are diverse and inconsistent because no unified datasets tagged with driving styles exist as a reliable benchmark. The absence of explicit driving style labels makes verifying different approaches and algorithms difficult. This paper provides a new benchmark by constructing a natural dataset of Driving Style (100-DrivingStyle) tagged with the subjective evaluation of 100 drivers' driving styles. In this dataset, the subjective quantification of each driver's driving style is from themselves and an expert according to the Likert-scale questionnaire. The testing routes are selected to cover various driving scenarios, including highways, urban, highway ramps, and signalized traffic. The collected driving data consists of lateral and longitudinal manipulation information, including steering angle, steering speed, lateral acceleration, throttle position, throttle rate, brake pressure, etc. This dataset is the first to provide detailed manipulation data with driving-style tags, and we demonstrate its benchmark function using six classifiers. The 100-DrivingStyle dataset is available via https://github.com/chaopengzhang/100-DrivingStyle-Dataset

Via

Access Paper or Ask Questions

Shareable Driving Style Learning and Analysis with a Hierarchical Latent Model

Oct 24, 2023

Chaopeng Zhang, Wenshuo Wang, Zhaokun Chen, Jian Zhang, Lijun Sun, Junqiang Xi

Abstract:Driving style is usually used to characterize driving behavior for a driver or a group of drivers. However, it remains unclear how one individual's driving style shares certain common grounds with other drivers. Our insight is that driving behavior is a sequence of responses to the weighted mixture of latent driving styles that are shareable within and between individuals. To this end, this paper develops a hierarchical latent model to learn the relationship between driving behavior and driving styles. We first propose a fragment-based approach to represent complex sequential driving behavior, allowing for sufficiently representing driving behavior in a low-dimension feature space. Then, we provide an analytical formulation for the interaction of driving behavior and shareable driving style with a hierarchical latent model by introducing the mechanism of Dirichlet allocation. Our developed model is finally validated and verified with 100 drivers in naturalistic driving settings with urban and highways. Experimental results reveal that individuals share driving styles within and between them. We also analyzed the influence of personalities (e.g., age, gender, and driving experience) on driving styles and found that a naturally aggressive driver would not always keep driving aggressively (i.e., could behave calmly sometimes) but with a higher proportion of aggressiveness than other types of drivers.

Via

Access Paper or Ask Questions

Interactive Car-Following: Matters but NOT Always

Jul 30, 2023

Chengyuan Zhang, Rui Chen, Jiacheng Zhu, Wenshuo Wang, Changliu Liu, Lijun Sun

Abstract:Following a leading vehicle is a daily but challenging task because it requires adapting to various traffic conditions and the leading vehicle's behaviors. However, the question `Does the following vehicle always actively react to the leading vehicle?' remains open. To seek the answer, we propose a novel metric to quantify the interaction intensity within the car-following pairs. The quantified interaction intensity enables us to recognize interactive and non-interactive car-following scenarios and derive corresponding policies for each scenario. Then, we develop an interaction-aware switching control framework with interactive and non-interactive policies, achieving a human-level car-following performance. The extensive simulations demonstrate that our interaction-aware switching control framework achieves improved control performance and data efficiency compared to the unified control strategies. Moreover, the experimental results reveal that human drivers would not always keep reacting to their leading vehicle but occasionally take safety-critical or intentional actions -- interaction matters but not always.

* Accepted by 26th IEEE International Conference on Intelligent Transportation Systems ITSC 2023

Via

Access Paper or Ask Questions

Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors

May 08, 2023

Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu, Steven L. Waslander

Abstract:When autonomous vehicles are deployed on public roads, they will encounter countless and diverse driving situations. Many manually designed driving policies are difficult to scale to the real world. Fortunately, reinforcement learning has shown great success in many tasks by automatic trial and error. However, when it comes to autonomous driving in interactive dense traffic, RL agents either fail to learn reasonable performance or necessitate a large amount of data. Our insight is that when humans learn to drive, they will 1) make decisions over the high-level skill space instead of the low-level control space and 2) leverage expert prior knowledge rather than learning from scratch. Inspired by this, we propose ASAP-RL, an efficient reinforcement learning algorithm for autonomous driving that simultaneously leverages motion skills and expert priors. We first parameterized motion skills, which are diverse enough to cover various complex driving scenarios and situations. A skill parameter inverse recovery method is proposed to convert expert demonstrations from control space to skill space. A simple but effective double initialization technique is proposed to leverage expert priors while bypassing the issue of expert suboptimality and early performance degradation. We validate our proposed method on interactive dense-traffic driving tasks given simple and sparse rewards. Experimental results show that our method can lead to higher learning efficiency and better driving performance relative to previous methods that exploit skills and priors differently. Code is open-sourced to facilitate further research.

* Robotics: Science and Systems (RSS 2023)

Via

Access Paper or Ask Questions

Understanding Bugs in Multi-Language Deep Learning Frameworks

Mar 05, 2023

Zengyang Li, Sicheng Wang, Wenshuo Wang, Peng Liang, Ran Mo, Bing Li

Figure 1 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 2 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 3 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 4 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Abstract:Deep learning frameworks (DLFs) have been playing an increasingly important role in this intelligence age since they act as a basic infrastructure for an increasingly wide range of AIbased applications. Meanwhile, as multi-programming-language (MPL) software systems, DLFs are inevitably suffering from bugs caused by the use of multiple programming languages (PLs). Hence, it is of paramount significance to understand the bugs (especially the bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a foundation for preventing, detecting, and resolving bugs in the development of DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12 types (e.g., algorithm design bugs and memory bugs) according to their bug labels and characteristics. Second, we further explored the impacts of different bug types on the development of DLFs, and found that deployment bugs and memory bugs negatively impact the development of DLFs in different aspects the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet, PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs. Finally, the code change complexity of MPL bug fixes is significantly greater than that of single-programming-language (SPL) bug fixes in all the three DLFs, while in PyTorch MPL bug fixes have longer open time and greater communication complexity than SPL bug fixes. These results provide insights for bug management in DLFs.

* The 31st IEEE/ACM International Conference on Program Comprehension (ICPC)

Via

Access Paper or Ask Questions

On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

Oct 15, 2022

Huanjie Wang, Haibin Liu, Wenshuo Wang, Lijun Sun

Figure 1 for On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

Figure 2 for On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

Figure 3 for On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

Figure 4 for On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

Abstract:Humans are experts in making decisions for challenging driving tasks with uncertainties. Many efforts have been made to model the decision-making process of human drivers at the behavior level. However, limited studies explain how human drivers actively make reliable sequential decisions to complete interactive driving tasks in an uncertain environment. This paper argues that human drivers intently search for actions to reduce the uncertainty of their perception of the environment, i.e., perceptual uncertainty, to a low level that allows them to make a trustworthy decision easily. This paper provides a proof of concept framework to empirically reveal that human drivers' perceptual uncertainty decreases when executing interactive tasks with uncertainties. We first introduce an explainable-artificial intelligence approach (i.e., SHapley Additive exPlanation, SHAP) to determine the salient features on which human drivers make decisions. Then, we use entropy-based measures to quantify the drivers' perceptual changes in these ranked salient features across the decision-making process, reflecting the changes in uncertainties. The validation and verification of our proposed method are conducted in the highway on-ramp merging scenario with congested traffic using the INTERACTION dataset. Experimental results support that human drivers intentionally seek information to reduce their perceptual uncertainties in the number and rank of salient features of their perception of environments to make a trustworthy decision.

* 12 pages, 12 figures

Via

Access Paper or Ask Questions

Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Sep 24, 2022

Tong Zhou, Letian Wang, Ruobing Chen, Wenshuo Wang, Yu Liu

Figure 1 for Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Figure 2 for Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Figure 3 for Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Figure 4 for Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Abstract:Efficient and effective exploration in continuous space is a central problem in applying reinforcement learning (RL) to autonomous driving. Skills learned from expert demonstrations or designed for specific tasks can benefit the exploration, but they are usually costly-collected, unbalanced/sub-optimal, or failing to transfer to diverse tasks. However, human drivers can adapt to varied driving tasks without demonstrations by taking efficient and structural explorations in the entire skill space rather than a limited space with task-specific skills. Inspired by the above fact, we propose an RL algorithm exploring all feasible motion skills instead of a limited set of task-specific and object-centric skills. Without demonstrations, our method can still perform well in diverse tasks. First, we build a task-agnostic and ego-centric (TaEc) motion skill library in a pure motion perspective, which is diverse enough to be reusable in different complex tasks. The motion skills are then encoded into a low-dimension latent skill space, in which RL can do exploration efficiently. Validations in various challenging driving scenarios demonstrate that our proposed method, TaEc-RL, outperforms its counterparts significantly in learning efficiency and task performance.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Social Interactions for Autonomous Driving: A Review and Perspective

Aug 17, 2022

Wenshuo Wang, Letian Wang, Chengyuan Zhang, Changliu Liu, Lijun Sun

Figure 1 for Social Interactions for Autonomous Driving: A Review and Perspective

Figure 2 for Social Interactions for Autonomous Driving: A Review and Perspective

Figure 3 for Social Interactions for Autonomous Driving: A Review and Perspective

Figure 4 for Social Interactions for Autonomous Driving: A Review and Perspective

Abstract:No human drives a car in a vacuum; she/he must negotiate with other road users to achieve their goals in social traffic scenes. A rational human driver can interact with other road users in a socially-compatible way through implicit communications to complete their driving tasks smoothly in interaction-intensive, safety-critical environments. This paper aims to review the existing approaches and theories to help understand and rethink the interactions among human drivers toward social autonomous driving. We take this survey to seek the answers to a series of fundamental questions: 1) What is social interaction in road traffic scenes? 2) How to measure and evaluate social interaction? 3) How to model and reveal the process of social interaction? 4) How do human drivers reach an implicit agreement and negotiate smoothly in social interaction? This paper reviews various approaches to modeling and learning the social interactions between human drivers, ranging from optimization theory and graphical models to social force theory and behavioral & cognitive science. We also highlight some new directions, critical challenges, and opening questions for future research.

* 122 pages, 36 figures

Via

Access Paper or Ask Questions