Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chunxiang Wang

Depth-aware Fusion Method based on Image and 4D Radar Spectrum for 3D Object Detection

Feb 21, 2025

Yue Sun, Yeqiang Qian, Chunxiang Wang, Ming Yang

Abstract:Safety and reliability are crucial for the public acceptance of autonomous driving. To ensure accurate and reliable environmental perception, intelligent vehicles must exhibit accuracy and robustness in various environments. Millimeter-wave radar, known for its high penetration capability, can operate effectively in adverse weather conditions such as rain, snow, and fog. Traditional 3D millimeter-wave radars can only provide range, Doppler, and azimuth information for objects. Although the recent emergence of 4D millimeter-wave radars has added elevation resolution, the radar point clouds remain sparse due to Constant False Alarm Rate (CFAR) operations. In contrast, cameras offer rich semantic details but are sensitive to lighting and weather conditions. Hence, this paper leverages these two highly complementary and cost-effective sensors, 4D millimeter-wave radar and camera. By integrating 4D radar spectra with depth-aware camera images and employing attention mechanisms, we fuse texture-rich images with depth-rich radar data in the Bird's Eye View (BEV) perspective, enhancing 3D object detection. Additionally, we propose using GAN-based networks to generate depth images from radar spectra in the absence of depth sensors, further improving detection accuracy.

* The 2024 IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024)

Via

Access Paper or Ask Questions

Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Dec 02, 2024

Qiyuan Shen, Hengwang Zhao, Weihao Yan, Chunxiang Wang, Tong Qin, Ming Yang

Figure 1 for Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Figure 2 for Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Figure 3 for Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Figure 4 for Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Abstract:Cross-modal localization has drawn increasing attention in recent years, while the visual relocalization in prior LiDAR maps is less studied. Related methods usually suffer from inconsistency between the 2D texture and 3D geometry, neglecting the intensity features in the LiDAR point cloud. In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization. In the map projection module, we construct the database of intensity channel map images leveraging the dense characteristic of panoramic projection. The coarse retrieval module retrieves the top-K most similar map images to the query image from the database, and retains the top-K' results by covisibility clustering. The fine relocalization module applies a two-stage 2D-3D association and a covisibility inlier selection method to obtain robust correspondences for 6DoF pose estimation. The experimental results on our self-collected datasets demonstrate the effectiveness in both place recognition and pose estimation tasks.

Via

Access Paper or Ask Questions

End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning

Oct 03, 2024

Yueyuan Li, Mingyang Jiang, Songan Zhang, Wei Yuan, Chunxiang Wang, Ming Yang

Abstract:Dynamic and interactive traffic scenarios pose significant challenges for autonomous driving systems. Reinforcement learning (RL) offers a promising approach by enabling the exploration of driving policies beyond the constraints of pre-collected datasets and predefined conditions, particularly in complex environments. However, a critical challenge lies in effectively extracting spatial and temporal features from sequences of high-dimensional, multi-modal observations while minimizing the accumulation of errors over time. Additionally, efficiently guiding large-scale RL models to converge on optimal driving policies without frequent failures during the training process remains tricky. We propose an end-to-end model-based RL algorithm named Ramble to address these issues. Ramble processes multi-view RGB images and LiDAR point clouds into low-dimensional latent features to capture the context of traffic scenarios at each time step. A transformer-based architecture is then employed to model temporal dependencies and predict future states. By learning a dynamics model of the environment, Ramble can foresee upcoming traffic events and make more informed, strategic decisions. Our implementation demonstrates that prior experience in feature extraction and decision-making plays a pivotal role in accelerating the convergence of RL models toward optimal driving policies. Ramble achieves state-of-the-art performance regarding route completion rate and driving score on the CARLA Leaderboard 2.0, showcasing its effectiveness in managing complex and dynamic traffic situations.

* 10 pages, 3 figures, experiment under progress, only to demonstrate the originality of the method

Via

Access Paper or Ask Questions

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

May 31, 2024

Mingyang Jiang, Yueyuan Li, Songan Zhang, Chunxiang Wang, Ming Yang

Abstract:Path planning plays a pivotal role in automated parking, yet current methods struggle to efficiently handle the intricate and diverse parking scenarios. One potential solution is the reinforcement learning-based method, leveraging its exploration in unrecorded situations. However, a key challenge lies in training reinforcement learning methods is the inherent randomness in converging to a feasible policy. This paper introduces a novel solution, the Hybrid POlicy Path plannEr (HOPE), which integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. The paper presents a method to calculate and implement an action mask mechanism in path planning, significantly boosting the efficiency and effectiveness of reinforcement learning training. A transformer is employed as the network structure to fuse environmental information and generate planned paths. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showcasing higher planning success rates and generalization across various scenarios. The code for our solution will be openly available on \href{GitHub}{https://github.com/jiamiya/HOPE}. % after the paper's acceptance.

* 10 pages, 6 tables, 5 figures, 1 page appendix

Via

Access Paper or Ask Questions

AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection

May 21, 2024

Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang

Abstract:Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. To address this limitation, various knowledge distillation methods have been proposed. However, traditional distillation methods focus only on the fusion features and ignore the large amount of information in the original multi-modal features, thereby restricting the student network's performance. To tackle the challenge, we introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network. Specifically, a Modal Extraction Alignment (MEA) module is utilized to derive learning weights for student networks, integrating focal and global attention mechanisms. This methodology enables the student network to acquire optimal fusion strategies independent from that of teacher network without necessitating an additional feature fusion module. Furthermore, we present the SMOD dataset, a well-aligned challenging multispectral dataset for detection. Extensive experiments on the challenging KAIST, LLVIP and SMOD datasets are conducted to validate the effectiveness of AMFD. The results demonstrate that our method outperforms existing state-of-the-art methods in both reducing log-average Miss Rate and improving mean Average Precision. The code is available at https://github.com/bigD233/AMFD.git.

Via

Access Paper or Ask Questions

A Survey of Simulators for Autonomous Driving: Taxonomy, Challenges, and Evaluation Metrics

Nov 18, 2023

Yueyuan Li, Wei Yuan, Weihao Yan, Qiyuan Shen, Chunxiang Wang, Ming Yang

Abstract:Simulators have irreplaceable importance for the research and development of autonomous driving. Besides saving resources, labor, and time, simulation is the only feasible way to reproduce many severe accident scenarios. Despite their widespread adoption across academia and industry, there is an absence in the evolutionary trajectory of simulators and critical discourse on their limitations. To bridge the gap in research, this paper conducts an in-depth review of simulators for autonomous driving. It delineates the three-decade development into three stages: specialized development period, gap period, and comprehensive development, from which it detects a trend of implementing comprehensive functionalities and open-source accessibility. Then it classifies the simulators by functions, identifying five categories: traffic flow simulator, vehicle dynamics simulator, scenario editor, sensory data generator, and driving strategy validator. Simulators that amalgamate diverse features are defined as comprehensive simulators. By investigating commercial and open-source simulators, this paper reveals that the critical issues faced by simulators primarily revolve around fidelity and efficiency concerns. This paper justifies that enhancing the realism of adverse weather simulation, automated map reconstruction, and interactive traffic participants will bolster credibility. Concurrently, headless simulation and multiple-speed simulation techniques will exploit the theoretic advantages. Moreover, this paper delves into potential solutions for the identified issues. It explores qualitative and quantitative evaluation metrics to assess the simulator's performance. This paper guides users to find suitable simulators efficiently and provides instructive suggestions for developers to improve simulator efficacy purposefully.

* 17 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions

SUNet: Scale-aware Unified Network for Panoptic Segmentation

Sep 07, 2022

Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang

Figure 1 for SUNet: Scale-aware Unified Network for Panoptic Segmentation

Figure 2 for SUNet: Scale-aware Unified Network for Panoptic Segmentation

Figure 3 for SUNet: Scale-aware Unified Network for Panoptic Segmentation

Figure 4 for SUNet: Scale-aware Unified Network for Panoptic Segmentation

Abstract:Panoptic segmentation combines the advantages of semantic and instance segmentation, which can provide both pixel-level and instance-level environmental perception information for intelligent vehicles. However, it is challenged with segmenting objects of various scales, especially on extremely large and small ones. In this work, we propose two lightweight modules to mitigate this problem. First, Pixel-relation Block is designed to model global context information for large-scale things, which is based on a query-independent formulation and brings small parameter increments. Then, Convectional Network is constructed to collect extra high-resolution information for small-scale stuff, supplying more appropriate semantic features for the downstream segmentation branches. Based on these two modules, we present an end-to-end Scale-aware Unified Network (SUNet), which is more adaptable to multi-scale objects. Extensive experiments on Cityscapes and COCO demonstrate the effectiveness of the proposed methods.

* 10 pages, 7 figures, 8 tables

Via

Access Paper or Ask Questions

Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Aug 23, 2022

Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang

Figure 1 for Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Figure 2 for Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Figure 3 for Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Figure 4 for Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Abstract:Semantic segmentation is an important task for intelligent vehicles to understand the environment. Current deep learning methods require large amounts of labeled data for training. Manual annotation is expensive, while simulators can provide accurate annotations. However, the performance of the semantic segmentation model trained with the data of the simulator will significantly decrease when applied in the actual scene. Unsupervised domain adaptation (UDA) for semantic segmentation has recently gained increasing research attention, aiming to reduce the domain gap and improve the performance on the target domain. In this paper, we propose a novel two-stage entropy-based UDA method for semantic segmentation. In stage one, we design a threshold-adaptative unsupervised focal loss to regularize the prediction in the target domain, which has a mild gradient neutralization mechanism and mitigates the problem that hard samples are barely optimized in entropy-based methods. In stage two, we introduce a data augmentation method named cross-domain image mixing (CIM) to bridge the semantic knowledge from two domains. Our method achieves state-of-the-art 58.4% and 59.6% mIoUs on SYNTHIA-to-Cityscapes and GTA5-to-Cityscapes using DeepLabV2 and competitive performance using the lightweight BiSeNet.

* 10 pages, 8 figure, 7 tables, submitted to T-ITS on April 2, 2022

Via

Access Paper or Ask Questions

BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Dec 04, 2021

Xiaoxiao Yang, Yeqian Qiang, Huijie Zhu, Chunxiang Wang, Ming Yang

Figure 1 for BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Figure 2 for BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Figure 3 for BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Figure 4 for BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Abstract:Thermal infrared (TIR) image has proven effectiveness in providing temperature cues to the RGB features for multispectral pedestrian detection. Most existing methods directly inject the TIR modality into the RGB-based framework or simply ensemble the results of two modalities. This, however, could lead to inferior detection performance, as the RGB and TIR features generally have modality-specific noise, which might worsen the features along with the propagation of the network. Therefore, this work proposes an effective and efficient cross-modality fusion module called Bi-directional Adaptive Attention Gate (BAA-Gate). Based on the attention mechanism, the BAA-Gate is devised to distill the informative features and recalibrate the representations asymptotically. Concretely, a bi-direction multi-stage fusion strategy is adopted to progressively optimize features of two modalities and retain their specificity during the propagation. Moreover, an adaptive interaction of BAA-Gate is introduced by the illumination-based weighting strategy to adaptively adjust the recalibrating and aggregating strength in the BAA-Gate and enhance the robustness towards illumination changes. Considerable experiments on the challenging KAIST dataset demonstrate the superior performance of our method with satisfactory speed.

Via

Access Paper or Ask Questions

Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

Apr 04, 2020

Xiaoliang Wang, Yeqiang Qian, Chunxiang Wang, Ming Yang

Figure 1 for Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

Figure 2 for Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

Figure 3 for Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

Figure 4 for Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

Abstract:As one of the most important tasks in autonomous driving systems, ego-lane detection has been extensively studied and has achieved impressive results in many scenarios. However, ego-lane detection in the missing feature scenarios is still an unsolved problem. To address this problem, previous methods have been devoted to proposing more complicated feature extraction algorithms, but they are very time-consuming and cannot deal with extreme scenarios. Different from others, this paper exploits prior knowledge contained in digital maps, which has a strong capability to enhance the performance of detection algorithms. Specifically, we employ the road shape extracted from OpenStreetMap as lane model, which is highly consistent with the real lane shape and irrelevant to lane features. In this way, only a few lane features are needed to eliminate the position error between the road shape and the real lane, and a search-based optimization algorithm is proposed. Experiments show that the proposed method can be applied to various scenarios and can run in real-time at a frequency of 20 Hz. At the same time, we evaluated the proposed method on the public KITTI Lane dataset where it achieves state-of-the-art performance. Moreover, our code will be open source after publication.

* There has some mistake in experiments

Via

Access Paper or Ask Questions