Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanwen Zhang

URPlanner: A Universal Paradigm For Collision-Free Robotic Motion Planning Based on Deep Reinforcement Learning

May 26, 2025

Fengkang Ying, Hanwen Zhang, Haozhe Wang, Huishi Huang, Marcelo H. Ang Jr

Abstract:Collision-free motion planning for redundant robot manipulators in complex environments is yet to be explored. Although recent advancements at the intersection of deep reinforcement learning (DRL) and robotics have highlighted its potential to handle versatile robotic tasks, current DRL-based collision-free motion planners for manipulators are highly costly, hindering their deployment and application. This is due to an overreliance on the minimum distance between the manipulator and obstacles, inadequate exploration and decision-making by DRL, and inefficient data acquisition and utilization. In this article, we propose URPlanner, a universal paradigm for collision-free robotic motion planning based on DRL. URPlanner offers several advantages over existing approaches: it is platform-agnostic, cost-effective in both training and deployment, and applicable to arbitrary manipulators without solving inverse kinematics. To achieve this, we first develop a parameterized task space and a universal obstacle avoidance reward that is independent of minimum distance. Second, we introduce an augmented policy exploration and evaluation algorithm that can be applied to various DRL algorithms to enhance their performance. Third, we propose an expert data diffusion strategy for efficient policy learning, which can produce a large-scale trajectory dataset from only a few expert demonstrations. Finally, the superiority of the proposed methods is comprehensively verified through experiments.

* Version 1. 20 pages, 19 figures

Via

Access Paper or Ask Questions

Challenges and Trends in Egocentric Vision: A Survey

Mar 19, 2025

Xiang Li, Heqian Qiu, Lanxiao Wang, Hanwen Zhang, Chenghao Qi, Linfeng Han, Huiyu Xiong, Hongliang Li

Abstract:With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.

Via

Access Paper or Ask Questions

Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Electric Vehicles

Jan 26, 2025

Hanwen Zhang, Ruichen Zhang, Wei Zhang, Dusit Niyato, Yonggang Wen

Abstract:Generative artificial intelligence, particularly through large language models (LLMs), is poised to transform energy optimization and demand side management (DSM) within microgrids. This paper explores the integration of LLMs into energy management, emphasizing their roles in automating the optimization of DSM strategies with electric vehicles. We investigate challenges and solutions associated with DSM and explore the new opportunities presented by leveraging LLMs. Then, We propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles, highlighting our solution's significant advancements in energy efficiency and user adaptability. This work underscores the potential of LLMs for energy optimization and fosters a new era of intelligent DSM solutions.

* 9 Pages

Via

Access Paper or Ask Questions

Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation

Dec 13, 2024

Rasmus Pagh, Lukas Retschmeier, Hao Wu, Hanwen Zhang

Figure 1 for Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation

Figure 2 for Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation

Figure 3 for Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation

Figure 4 for Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation

Abstract:We study the problem of privately releasing an approximate minimum spanning tree (MST). Given a graph $G = (V, E, \vec{W})$ where $V$ is a set of $n$ vertices, $E$ is a set of $m$ undirected edges, and $ \vec{W} \in \mathbb{R}^{|E|} $ is an edge-weight vector, our goal is to publish an approximate MST under edge-weight differential privacy, as introduced by Sealfon in PODS 2016, where $V$ and $E$ are considered public and the weight vector is private. Our neighboring relation is $\ell_\infty$-distance on weights: for a sensitivity parameter $\Delta_\infty$, graphs $ G = (V, E, \vec{W}) $ and $ G' = (V, E, \vec{W}') $ are neighboring if $\|\vec{W}-\vec{W}'\|_\infty \leq \Delta_\infty$. Existing private MST algorithms face a trade-off, sacrificing either computational efficiency or accuracy. We show that it is possible to get the best of both worlds: With a suitable random perturbation of the input that does not suffice to make the weight vector private, the result of any non-private MST algorithm will be private and achieves a state-of-the-art error guarantee. Furthermore, by establishing a connection to Private Top-k Selection [Steinke and Ullman, FOCS '17], we give the first privacy-utility trade-off lower bound for MST under approximate differential privacy, demonstrating that the error magnitude, $\tilde{O}(n^{3/2})$, is optimal up to logarithmic factors. That is, our approach matches the time complexity of any non-private MST algorithm and at the same time achieves optimal error. We complement our theoretical treatment with experiments that confirm the practicality of our approach.

Via

Access Paper or Ask Questions

CoMA: Compositional Human Motion Generation with Multi-modal Agents

Dec 10, 2024

Shanlin Sun, Gabriel De Araujo, Jiaqi Xu, Shenghan Zhou, Hanwen Zhang, Ziheng Huang, Chenyu You, Xiaohui Xie

Figure 1 for CoMA: Compositional Human Motion Generation with Multi-modal Agents

Figure 2 for CoMA: Compositional Human Motion Generation with Multi-modal Agents

Figure 3 for CoMA: Compositional Human Motion Generation with Multi-modal Agents

Figure 4 for CoMA: Compositional Human Motion Generation with Multi-modal Agents

Abstract:3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely due to the scarcity of motion datasets and the prohibitive cost of generating new training examples. To address these challenges, we introduce CoMA, an agent-based solution for complex human motion generation, editing, and comprehension. CoMA leverages multiple collaborative agents powered by large language and vision models, alongside a mask transformer-based motion generator featuring body part-specific encoders and codebooks for fine-grained control. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction for improved quality. Evaluations on the HumanML3D dataset demonstrate competitive performance against state-of-the-art methods. Additionally, we create a set of context-rich, compositional, and long text prompts, where user studies show our method significantly outperforms existing approaches.

* Project Page: https://gabrie-l.github.io/coma-page/

Via

Access Paper or Ask Questions

Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

May 02, 2024

Hanwen Zhang, Mingzhe Chen, Alireza Vahid, Haijian Sun

Figure 1 for Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Figure 2 for Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Figure 3 for Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Figure 4 for Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Abstract:Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data.

* submitted to IEEE conference

Via

Access Paper or Ask Questions

Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Mar 25, 2024

Xiang Ma, Yuan Zhou, Hanwen Zhang, Qun Wang, Haijian Sun, Hongjie Wang, Rose Qingyang Hu

Figure 1 for Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Figure 2 for Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Figure 3 for Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Figure 4 for Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Abstract:As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driving range and the extended time required for charging. One alternative solution to address these challenges is implementing dynamic wireless power transfer (DWPT), charging EVs in motion on the road. Moreover, charging stations with static wireless power transfer (SWPT) infrastructure can replace existing gas stations, enabling users to charge EVs in parking lots or at home. This paper surveys the communication infrastructure for static and dynamic wireless charging in electric vehicles. It encompasses all communication aspects involved in the wireless charging process. The architecture and communication requirements for static and dynamic wireless charging are presented separately. Additionally, a comprehensive comparison of existing communication standards is provided. The communication with the grid is also explored in detail. The survey gives attention to security and privacy issues arising during communications. In summary, the paper addresses the challenges and outlines upcoming trends in communication for EV wireless charging.

* submitted to IET Communication as a survey paper

Via

Access Paper or Ask Questions

Automated interpretation of congenital heart disease from multi-view echocardiograms

Nov 30, 2023

Jing Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang

Figure 1 for Automated interpretation of congenital heart disease from multi-view echocardiograms

Figure 2 for Automated interpretation of congenital heart disease from multi-view echocardiograms

Figure 3 for Automated interpretation of congenital heart disease from multi-view echocardiograms

Figure 4 for Automated interpretation of congenital heart disease from multi-view echocardiograms

Abstract:Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework. We collect the five-view echocardiograms video records of 1308 subjects (including normal controls, ventricular septal defect (VSD) patients and atrial septal defect (ASD) patients) with both disease labels and standard-view key-frame labels. Depthwise separable convolution-based multi-channel networks are adopted to largely reduce the network parameters. We also approach the imbalanced class problem by augmenting the positive training samples. Our 2D key-frame model can diagnose CHD or negative samples with an accuracy of 95.4\%, and in negative, VSD or ASD classification with an accuracy of 92.3\%. To further alleviate the work of key-frame selection in real-world implementation, we propose an adaptive soft attention scheme to directly explore the raw video data. Four kinds of neural aggregation methods are systematically investigated to fuse the information of an arbitrary number of frames in a video. Moreover, with a view detection module, the system can work without the view records. Our video-based model can diagnose with an accuracy of 93.9\% (binary classification), and 92.1\% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing. The detailed ablation study and the interpretability analysis are provided.

* Medical Image Analysis (Volume 69, April 2021, 101942)
* Published in Medical Image Analysis

Via

Access Paper or Ask Questions

Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

Oct 29, 2023

Peichun Li, Hanwen Zhang, Yuan Wu, Liping Qian, Rong Yu, Dusit Niyato, Xuemin Shen

Figure 1 for Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

Figure 2 for Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

Figure 3 for Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

Figure 4 for Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

Abstract:Distributed Artificial Intelligence (AI) model training over mobile edge networks encounters significant challenges due to the data and resource heterogeneity of edge devices. The former hampers the convergence rate of the global model, while the latter diminishes the devices' resource utilization efficiency. In this paper, we propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data. Specifically, FIMI can be considered as a resource-aware data augmentation method that effectively mitigates the data heterogeneity while ensuring efficient FL training. We first quantify the relationship between the training data amount and the learning performance. We then study the FIMI optimization problem with the objective of minimizing the device-side overall energy consumption subject to required learning performance constraints. The decomposition-based analysis and the cross-entropy searching method are leveraged to derive the solution, where each device is assigned suitable AI-synthesized data and resource utilization policy. Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy in comparison with the existing methods. Meanwhile, FIMI can significantly enhance the converged global accuracy under the non-independently-and-identically distribution (non-IID) data.

* 13 pages, 5 figures. Submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

Energy Efficient Robust Beamforming for Vehicular ISAC with Imperfect Channel Estimation

Oct 26, 2023

Hanwen Zhang, Haijian Sun, Tianyi He, Weiming Xiang, Rose Qingyang Hu

Abstract:This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded channel estimation errors. Next, fractional programming and semidefinite relaxation (SDR) are utilized to relax the rank-1 constraints. We further use Schur complement and S-Procedure to transform Cramer-Rao bound (CRB) and channel estimation error constraints into convex forms, respectively. Based on the Lagrangian dual function and Karush-Kuhn-Tucker (KKT) conditions, it is proved that the optimal beamforming solution is rank-1. Finally, we present comprehensive simulation results to demonstrate two key findings: 1) the proposed algorithm exhibits a favorable convergence rate, and 2) the approach effectively mitigates the impact of channel estimation errors.

* Submitted to IEEE for future publication

Via

Access Paper or Ask Questions