Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiwen Zeng

RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation

May 10, 2025

Zhiwen Zeng, Yunfei Yin, Zheng Yuan, Argho Dey, Xianjian Bao

Abstract:Bird's-Eye-View (BEV) semantic segmentation provides comprehensive environmental perception for autonomous driving but suffers multi-modal misalignment and sensor noise. We propose RESAR-BEV, a progressive refinement framework that advances beyond single-step end-to-end approaches: (1) progressive refinement through residual autoregressive learning that decomposes BEV segmentation into interpretable coarse-to-fine stages via our Drive-Transformer and Modifier-Transformer residual prediction cascaded architecture, (2) robust BEV representation combining ground-proximity voxels with adaptive height offsets and dual-path voxel feature encoding (max+attention pooling) for efficient feature extraction, and (3) decoupled supervision with offline Ground Truth decomposition and online joint optimization to prevent overfitting while ensuring structural coherence. Experiments on nuScenes demonstrate RESAR-BEV achieves state-of-the-art performance with 54.0% mIoU across 7 essential driving-scene categories while maintaining real-time capability at 14.6 FPS. The framework exhibits robustness in challenging scenarios of long-range perception and adverse weather conditions.

* This work was submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS) on 09-May-2025

Via

Access Paper or Ask Questions

LuSeg: Efficient Negative and Positive Obstacles Segmentation via Contrast-Driven Multi-Modal Feature Fusion on the Lunar

Mar 14, 2025

Shuaifeng Jiao, Zhiwen Zeng, Zhuoqun Su, Xieyuanli Chen, Zongtan Zhou, Huimin Lu

Abstract:As lunar exploration missions grow increasingly complex, ensuring safe and autonomous rover-based surface exploration has become one of the key challenges in lunar exploration tasks. In this work, we have developed a lunar surface simulation system called the Lunar Exploration Simulator System (LESS) and the LunarSeg dataset, which provides RGB-D data for lunar obstacle segmentation that includes both positive and negative obstacles. Additionally, we propose a novel two-stage segmentation network called LuSeg. Through contrastive learning, it enforces semantic consistency between the RGB encoder from Stage I and the depth encoder from Stage II. Experimental results on our proposed LunarSeg dataset and additional public real-world NPO road obstacle dataset demonstrate that LuSeg achieves state-of-the-art segmentation performance for both positive and negative obstacles while maintaining a high inference speed of approximately 57\,Hz. We have released the implementation of our LESS system, LunarSeg dataset, and the code of LuSeg at:https://github.com/nubot-nudt/LuSeg.

Via

Access Paper or Ask Questions

HI-GVF: Shared Control based on Human-Influenced Guiding Vector Fields for Human-multi-robot Cooperation

Feb 17, 2025

Pengming Zhu, Zongtan Zhou, Weijia Yao, Wei Dai, Zhiwen Zeng, Huimin Lu

Abstract:Human-multi-robot shared control leverages human decision-making and robotic autonomy to enhance human-robot collaboration. While widely studied, existing systems often adopt a leader-follower model, limiting robot autonomy to some extent. Besides, a human is required to directly participate in the motion control of robots through teleoperation, which significantly burdens the operator. To alleviate these two issues, we propose a layered shared control computing framework using human-influenced guiding vector fields (HI-GVF) for human-robot collaboration. HI-GVF guides the multi-robot system along a desired path specified by the human. Then, an intention field is designed to merge the human and robot intentions, accelerating the propagation of the human intention within the multi-robot system. Moreover, we give the stability analysis of the proposed model and use collision avoidance based on safety barrier certificates to fine-tune the velocity. Eventually, considering the firefighting task as an example scenario, we conduct simulations and experiments using multiple human-robot interfaces (brain-computer interface, myoelectric wristband, eye-tracking), and the results demonstrate that our proposed approach boosts the effectiveness and performance of the task.

Via

Access Paper or Ask Questions

Consensus-Based Dynamic Task Allocation for Multi-Robot System Considering Payloads Consumption

Dec 13, 2024

Xuekai Qiu, Pengming Zhu, Yiming Hu, Zhiwen Zeng, Huimin Lu

Abstract:This paper presents a consensus-based payload algorithm (CBPA) to deal with the condition of robots' capability decrease for multi-robot task allocation. During the execution of complex tasks, robots' capabilities could decrease with the consumption of payloads, which causes a problem that the robot coalition would not meet the tasks' requirements in real time. The proposed CBPA is an enhanced version of the consensus-based bundle algorithm (CBBA) and comprises two primary core phases: the payload bundle construction and consensus phases. In the payload bundle construction phase, CBPA introduces a payload assignment matrix to track the payloads carried by the robots and the demands of multi-robot tasks in real time. Then, robots share their respective payload assignment matrix in the consensus phase. These two phases are iterated to dynamically adjust the number of robots performing multi-robot tasks and the number of tasks each robot performs and obtain conflict-free results to ensure that the robot coalition meets the demand and completes all tasks as quickly as possible. Physical experiment shows that CBPA is appropriate in complex and dynamic scenarios where robots need to collaborate and task requirements are tightly coupled to the robots' payloads. Numerical experiments show that CBPA has higher total task gains than CBBA.

Via

Access Paper or Ask Questions

SPF-EMPC Planner: A real-time multi-robot trajectory planner for complex environments with uncertainties

Oct 17, 2024

Peng Liu, Pengming Zhu, Zhiwen Zeng, Xuekai Qiu, Yu Wang, Huimin Lu

Abstract:In practical applications, the unpredictable movement of obstacles and the imprecise state observation of robots introduce significant uncertainties for the swarm of robots, especially in cluster environments. However, existing methods are difficult to realize safe navigation, considering uncertainties, complex environmental structures, and robot swarms. This paper introduces an extended state model predictive control planner with a safe probability field to address the multi-robot navigation problem in complex, dynamic, and uncertain environments. Initially, the safe probability field offers an innovative approach to model the uncertainty of external dynamic obstacles, combining it with an unconstrained optimization method to generate safe trajectories for multi-robot online. Subsequently, the extended state model predictive controller can accurately track these generated trajectories while considering the robots' inherent model constraints and state uncertainty, thus ensuring the practical feasibility of the planned trajectories. Simulation experiments show a success rate four times higher than that of state-of-the-art algorithms. Physical experiments demonstrate the method's ability to operate in real-time, enabling safe navigation for multi-robot in uncertain environments.

Via

Access Paper or Ask Questions

DVRP-MHSI: Dynamic Visualization Research Platform for Multimodal Human-Swarm Interaction

Aug 20, 2024

Pengming Zhu, Zhiwen Zeng, Weijia Yao, Wei Dai, Huimin Lu, Zongtan Zhou

Abstract:In recent years, there has been a significant amount of research on algorithms and control methods for distributed collaborative robots. However, the emergence of collective behavior in a swarm is still difficult to predict and control. Nevertheless, human interaction with the swarm helps render the swarm more predictable and controllable, as human operators can utilize intuition or knowledge that is not always available to the swarm. Therefore, this paper designs the Dynamic Visualization Research Platform for Multimodal Human-Swarm Interaction (DVRP-MHSI), which is an innovative open system that can perform real-time dynamic visualization and is specifically designed to accommodate a multitude of interaction modalities (such as brain-computer, eye-tracking, electromyographic, and touch-based interfaces), thereby expediting progress in human-swarm interaction research. Specifically, the platform consists of custom-made low-cost omnidirectional wheeled mobile robots, multitouch screens and two workstations. In particular, the mutitouch screens can recognize human gestures and the shapes of objects placed on them, and they can also dynamically render diverse scenes. One of the workstations processes communication information within robots and the other one implements human-robot interaction methods. The development of DVRP-MHSI frees researchers from hardware or software details and allows them to focus on versatile swarm algorithms and human-swarm interaction methods without being limited to fixed scenarios, tasks, and interfaces. The effectiveness and potential of the platform for human-swarm interaction studies are validated by several demonstrative experiments.

Via

Access Paper or Ask Questions

Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

May 13, 2024

Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

Abstract:Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation.

* 27 pages, 7 figures

Via

Access Paper or Ask Questions

Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning

Feb 26, 2021

Zhiqian Zhou, Pengming Zhu, Zhiwen Zeng, Junhao Xiao, Huimin Lu, Zongtan Zhou

Figure 1 for Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning

Figure 2 for Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning

Figure 3 for Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning

Figure 4 for Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning

Abstract:It is still an open and challenging problem for mobile robots navigating along time-efficient and collision-free paths in a crowd. The main challenge comes from the complex and sophisticated interaction mechanism, which requires the robot to understand the crowd and perform proactive and foresighted behaviors. Deep reinforcement learning is a promising solution to this problem. However, most previous learning methods incur a tremendous computational burden. To address these problems, we propose a graph-based deep reinforcement learning method, SG-DQN, that (i) introduces a social attention mechanism to extract an efficient graph representation for the crowd-robot state; (ii) directly evaluates the coarse q-values of the raw state with a learned dueling deep Q network(DQN); and then (iii) refines the coarse q-values via online planning on possible future trajectories. The experimental results indicate that our model can help the robot better understand the crowd and achieve a high success rate of more than 0.99 in the crowd navigation task. Compared against previous state-of-the-art algorithms, our algorithm achieves an equivalent, if not better, performance while requiring less than half of the computational cost.

* 8 pages, 7 figures, and two tables. The paper is in submission

Via

Access Paper or Ask Questions

Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Aug 16, 2020

Zhiwen Zeng, Matloob Khushi

Figure 1 for Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Figure 2 for Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Figure 3 for Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Figure 4 for Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Abstract:Every change of trend in the forex market presents a great opportunity as well as a risk for investors. Accurate forecasting of forex prices is a crucial element in any effective hedging or speculation strategy. However, the complex nature of the forex market makes the predicting problem challenging, which has prompted extensive research from various academic disciplines. In this paper, a novel approach that integrates the wavelet denoising, Attention-based Recurrent Neural Network (ARNN), and Autoregressive Integrated Moving Average (ARIMA) are proposed. Wavelet transform removes the noise from the time series to stabilize the data structure. ARNN model captures the robust and non-linear relationships in the sequence and ARIMA can well fit the linear correlation of the sequential information. By hybridization of the three models, the methodology is capable of modelling dynamic systems such as the forex market. Our experiments on USD/JPY five-minute data outperforms the baseline methods. Root-Mean-Squared-Error (RMSE) of the hybrid approach was found to be 1.65 with a directional accuracy of ~76%.

* IJCNN 2020

Via

Access Paper or Ask Questions