Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guangming Xie

Guided Policy Optimization under Partial Observability

May 21, 2025

Yueheng Li, Guangming Xie, Zongqing Lu

Abstract:Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty. While additional information, such as that available in simulations, can enhance training, effectively leveraging it remains an open problem. To address this, we introduce Guided Policy Optimization (GPO), a framework that co-trains a guider and a learner. The guider takes advantage of privileged information while ensuring alignment with the learner's policy that is primarily trained via imitation learning. We theoretically demonstrate that this learning scheme achieves optimality comparable to direct RL, thereby overcoming key limitations inherent in existing approaches. Empirical evaluations show strong performance of GPO across various tasks, including continuous control with partial observability and noise, and memory-based challenges, significantly outperforming existing methods.

* 24 pages, 13 figures

Via

Access Paper or Ask Questions

Value Function Decomposition in Markov Recommendation Process

Jan 29, 2025

Xiaobei Wang, Shuchang Liu, Qingpeng Cai, Xiang Li, Lantao Hu, Han li, Guangming Xie

Figure 1 for Value Function Decomposition in Markov Recommendation Process

Figure 2 for Value Function Decomposition in Markov Recommendation Process

Figure 3 for Value Function Decomposition in Markov Recommendation Process

Figure 4 for Value Function Decomposition in Markov Recommendation Process

Abstract:Recent advances in recommender systems have shown that user-system interaction essentially formulates long-term optimization problems, and online reinforcement learning can be adopted to improve recommendation performance. The general solution framework incorporates a value function that estimates the user's expected cumulative rewards in the future and guides the training of the recommendation policy. To avoid local maxima, the policy may explore potential high-quality actions during inference to increase the chance of finding better future rewards. To accommodate the stepwise recommendation process, one widely adopted approach to learning the value function is learning from the difference between the values of two consecutive states of a user. However, we argue that this paradigm involves an incorrect approximation in the stochastic process. Specifically, between the current state and the next state in each training sample, there exist two separate random factors from the stochastic policy and the uncertain user environment. Original temporal difference (TD) learning under these mixed random factors may result in a suboptimal estimation of the long-term rewards. As a solution, we show that these two factors can be separately approximated by decomposing the original temporal difference loss. The disentangled learning framework can achieve a more accurate estimation with faster learning and improved robustness against action exploration. As empirical verification of our proposed method, we conduct offline experiments with online simulated environments built based on public datasets.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

Future Impact Decomposition in Request-level Recommendations

Feb 07, 2024

Xiaobei Wang, Shuchang Liu, Xueliang Wang, Qingpeng Cai, Lantao Hu, Han Li, Peng Jiang, Guangming Xie

Abstract:In recommender systems, reinforcement learning solutions have shown promising results in optimizing the interaction sequence between users and the system over the long-term performance. For practical reasons, the policy's actions are typically designed as recommending a list of items to handle users' frequent and continuous browsing requests more efficiently. In this list-wise recommendation scenario, the user state is updated upon every request in the corresponding MDP formulation. However, this request-level formulation is essentially inconsistent with the user's item-level behavior. In this study, we demonstrate that an item-level optimization approach can better utilize item characteristics and optimize the policy's performance even under the request-level MDP. We support this claim by comparing the performance of standard request-level methods with the proposed item-level actor-critic framework in both simulation and online experiments. Furthermore, we show that a reward-based future decomposition strategy can better express the item-wise future impact and improve the recommendation accuracy in the long term. To achieve a more thorough understanding of the decomposition strategy, we propose a model-based re-weighting framework with adversarial learning that further boost the performance and investigate its correlation with the reward-based strategy.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Jun 04, 2022

Jianing Bai, Tianhao Zhang, Guangming Xie

Figure 1 for MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Figure 2 for MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Figure 3 for MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Figure 4 for MACC: Cross-Layer Multi-Agent Congestion Control with Deep Reinforcement Learning

Abstract:Congestion Control (CC), as the core networking task to efficiently utilize network capacity, received great attention and widely used in various Internet communication applications such as 5G, Internet-of-Things, UAN, and more. Various CC algorithms have been proposed both on network and transport layers such as Active Queue Management (AQM) algorithm and Transmission Control Protocol (TCP) congestion control mechanism. But it is hard to model dynamic AQM/TCP system and cooperate two algorithms to obtain excellent performance under different communication scenarios. In this paper, we explore the performance of multi-agent reinforcement learning-based cross-layer congestion control algorithms and present cooperation performance of two agents, known as MACC (Multi-agent Congestion Control). We implement MACC in NS3. The simulation results show that our scheme outperforms other congestion control combination in terms of throughput and delay, etc. Not only does it proves that networking protocols based on multi-agent deep reinforcement learning is efficient for communication managing, but also verifies that networking area can be used as new playground for machine learning algorithms.

* 7 pages, 8 figures

Via

Access Paper or Ask Questions

Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Mar 09, 2021

Tianhao Zhang, Yueheng Li, Shuai Li, Qiwei Ye, Chen Wang, Guangming Xie

Figure 1 for Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Figure 2 for Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Figure 3 for Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Figure 4 for Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Abstract:In this paper, the circle formation control problem is addressed for a group of cooperative underactuated fish-like robots involving unknown nonlinear dynamics and disturbances. Based on the reinforcement learning and cognitive consistency theory, we propose a decentralized controller without the knowledge of the dynamics of the fish-like robots. The proposed controller can be transferred from simulation to reality. It is only trained in our established simulation environment, and the trained controller can be deployed to real robots without any manual tuning. Simulation results confirm that the proposed model-free robust formation control method is scalable with respect to the group size of the robots and outperforms other representative RL algorithms. Several experiments in the real world verify the effectiveness of our RL-based approach for circle formation control.

* to be published in ICRA2021

Via

Access Paper or Ask Questions

An Electrocommunication System Using FSK Modulation and Deep Learning Based Demodulation for Underwater Robots

Aug 24, 2020

Qinghao Wang, Ruijun Liu, Wei Wang, Guangming Xie

Figure 1 for An Electrocommunication System Using FSK Modulation and Deep Learning Based Demodulation for Underwater Robots

Figure 2 for An Electrocommunication System Using FSK Modulation and Deep Learning Based Demodulation for Underwater Robots

Figure 3 for An Electrocommunication System Using FSK Modulation and Deep Learning Based Demodulation for Underwater Robots

Figure 4 for An Electrocommunication System Using FSK Modulation and Deep Learning Based Demodulation for Underwater Robots

Abstract:Underwater communication is extremely challenging for small underwater robots which typically have stringent power and size constraints. In our previous work, we developed an artificial electrocommunication system which could be an alternative for the communication of small underwater robots. This paper further presents a new electrocommunication system that utilizes Binary Frequency Shift Keying (2FSK) modulation and deep-learning-based demodulation for underwater robots. We first derive an underwater electrocommunication model that covers both the near-field area and a large transition area outside of the near-field area. 2FSK modulation is adopted to improve the anti-interference ability of the electric signal. A deep learning algorithm is used to demodulate the electric signal by the receiver. Simulations and experiments show that with the same testing condition, the new communication system outperforms the previous system in both the communication distance and the data transmitting rate. In specific, the newly developed communication system achieves stable communication within the distance of 10 m at a data transfer rate of 5 Kbps with a power consumption of less than 0.1 W. The substantial increase in communication distance further improves the possibility of electrocommunication in underwater robotics.

* IROS2020

Via

Access Paper or Ask Questions

Motion Planning for Heterogeneous Unmanned Systems under Partial Observation from UAV

Jul 28, 2020

Ci Chen, Yuanfang Wan, Baowei Li, Chen Wang, Guangming Xie, Huanyu Jiang

Figure 1 for Motion Planning for Heterogeneous Unmanned Systems under Partial Observation from UAV

Figure 2 for Motion Planning for Heterogeneous Unmanned Systems under Partial Observation from UAV

Figure 3 for Motion Planning for Heterogeneous Unmanned Systems under Partial Observation from UAV

Figure 4 for Motion Planning for Heterogeneous Unmanned Systems under Partial Observation from UAV

Abstract:For heterogeneous unmanned systems composed of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), using UAVs serve as eyes to assist UGVs in motion planning is a promising research direction due to the UAVs' vast view scope. However, due to UAVs flight altitude limitations, it may be impossible to observe the global map, and motion planning in the local map is a POMDP (Partially Observable Markov Decision Process) problem. This paper proposes a motion planning algorithm for heterogeneous unmanned system under partial observation from UAV without reconstruction of global maps, which consists of two parts designed for perception and decision-making, respectively. For the perception part, we propose the Grid Map Generation Network (GMGN), which is used to perceive scenes from UAV's perspective and classify the pathways and obstacles. For the decision-making part, we propose the Motion Command Generation Network (MCGN). Due to the addition of memory mechanism, MCGN has planning and reasoning abilities under partial observation from UAVs. We evaluate our proposed algorithm by comparing with baseline algorithms. The results show that our method effectively plans the motion of heterogeneous unmanned systems and achieves a relatively high success rate.

Via

Access Paper or Ask Questions

A Thermoplastic Elastomer Belt Based Robotic Gripper

Jun 24, 2020

Xingwen Zheng, Ningzhe Hou, Pascal Johannes Daniel Dinjens, Ruifeng Wang, Chengyang Dong, Guangming Xie

Figure 1 for A Thermoplastic Elastomer Belt Based Robotic Gripper

Figure 2 for A Thermoplastic Elastomer Belt Based Robotic Gripper

Figure 3 for A Thermoplastic Elastomer Belt Based Robotic Gripper

Figure 4 for A Thermoplastic Elastomer Belt Based Robotic Gripper

Abstract:Novel robotic grippers have captured increasing interests recently because of their abilities to adapt to varieties of circumstances and their powerful functionalities. Differing from traditional gripper with mechanical components-made fingers, novel robotic grippers are typically made of novel structures and materials, using a novel manufacturing process. In this paper, a novel robotic gripper with external frame and internal thermoplastic elastomer belt-made net is proposed. The gripper grasps objects using the friction between the net and objects. It has the ability of adaptive gripping through flexible contact surface. Stress simulation has been used to explore the regularity between the normal stress on the net and the deformation of the net. Experiments are conducted on a variety of objects to measure the force needed to reliably grip and hold the object. Test results show that the gripper can successfully grip objects with varying shape, dimensions, and textures. It is promising that the gripper can be used for grasping fragile objects in the industry or out in the field, and also grasping the marine organisms without hurting them.

Via

Access Paper or Ask Questions

Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Jun 23, 2020

Xingwen Zheng, Wei Wang, Liang Li, Guangming Xie

Figure 1 for Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Figure 2 for Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Figure 3 for Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Figure 4 for Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Abstract:The lateral line enables fish to efficiently sense the surrounding environment, thus assisting flow-related fish behaviours. Inspired by this phenomenon, varieties of artificial lateral line systems (ALLSs) have been developed and applied to underwater robots. This article focuses on using the pressure sensor arrays based on ALLS-measured hydrodynamic pressure variations (HPVs) for estimating the relative state between two adjacent robotic fish with leader-follower formation. The relative states include the relative oscillating frequency, amplitude, and offset of the upstream robotic fish to the downstream robotic fish, the relative vertical distance, the relative yaw angle, the relative pitch angle, and the relative roll angle between the two adjacent robotic fish. Regression model between the ALLS-measured and the mentioned relative states is investigated, and regression model-based relative state estimation is conducted. Specifically, two criteria are proposed firstly to investigate not only the sensitivity of each pressure sensor to the variations of relative state but also the insufficiency and redundancy of the pressure sensors. And thus the pressure sensors used for regression analysis are determined. Then four typical regression methods, including random forest algorithm, support vector regression, back propagation neural network, and multiple linear regression method are used for establishing regression models between the ALLS-measured HPVs and the relative states. Then regression effects of the four methods are compared and discussed. Finally, random forest-based method, which has the best regression effect, is used to estimate relative yaw angle and oscillating amplitude using the ALLS-measured HPVs and exhibits excellent estimation performance. This work contributes to local relative estimation for a group of underwater robots, which has always been a challenge.

Via

Access Paper or Ask Questions

Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Jun 23, 2020

Xingwen Zheng, Minglei Xiong, Junzheng Zheng, Manyi Wang, Runyu Tian, Guangming Xie

Figure 1 for Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Figure 2 for Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Figure 3 for Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Figure 4 for Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Abstract:Dynamic modeling has been capturing attention for its fundamentality in precise locomotion analyses and control of underwater robots. However, the existing researches have mainly focused on investigating two-dimensional motion of underwater robots, and little attention has been paid to three-dimensional dynamic modeling, which is just what we focus on. In this article, a three-dimensional dynamic model of an active-tail-actuated robotic fish with a barycentre regulating mechanism is built by combining Newton's second law for linear motion and Euler's equation for angular motion. The model parameters are determined by three-dimensional computer-aided design (CAD) software SolidWorks, HyperFlow-based computational fluid dynamics (CFD) simulation, and grey-box model estimation method. Both kinematic experiments with a prototype and numerical simulations are applied to validate the accuracy of the dynamic model mutually. Based on the dynamic model, multiple three-dimensional motions, including rectilinear motion, turning motion, gliding motion, and spiral motion, are analyzed. The experimental and simulation results demonstrate the effectiveness of the proposed model in evaluating the trajectory, attitude, and motion parameters, including the velocity, turning radius, angular velocity, etc., of the robotic fish.

Via

Access Paper or Ask Questions