Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuanzhe Geng

Dynamic Relay Selection and Power Allocation for Minimizing Outage Probability: A Hierarchical Reinforcement Learning Approach

Nov 10, 2020

Yuanzhe Geng, Erwu Liu, Rui Wang, Yiming Liu

Figure 1 for Dynamic Relay Selection and Power Allocation for Minimizing Outage Probability: A Hierarchical Reinforcement Learning Approach

Figure 2 for Dynamic Relay Selection and Power Allocation for Minimizing Outage Probability: A Hierarchical Reinforcement Learning Approach

Figure 3 for Dynamic Relay Selection and Power Allocation for Minimizing Outage Probability: A Hierarchical Reinforcement Learning Approach

Figure 4 for Dynamic Relay Selection and Power Allocation for Minimizing Outage Probability: A Hierarchical Reinforcement Learning Approach

Abstract:Cooperative communication is an effective approach to improve spectrum utilization. When considering relay selection and power allocation in cooperative communication, most of the existing studies require the assumption of channel state information (CSI). However, it is difficult to get an accurate CSI in practice. In this paper, we consider an outage-based method subjected to a total transmission power constraint in the two-hop cooperative communication scenario. We use reinforcement learning (RL) methods to learn strategies, and complete the optimal relay selection and power allocation, which do not need any prior knowledge of CSI but simply rely on the interaction with the communication environment. It is noted that conventional RL methods, including common deep reinforcement learning (DRL) methods, perform poorly when the search space is large. Therefore, we first propose a practical DRL framework with an outage-based reward function, which is used as a baseline. Then, we further propose our novel hierarchical reinforcement learning (HRL) algorithm for dynamic relay selection and power allocation. A key difference from other RL-based methods in existing literatures is that, our HRL approach decomposes relay selection and power allocation into two hierarchical optimization objectives, which are trained in different levels. Simulation results reveal that our HRL algorithm trains faster and obtains a lower outage probability when compared with traditional DRL methods, especially in a sparse reward environment.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Nov 03, 2020

Yuanzhe Geng, Erwu Liu, Rui Wang, Yiming Liu

Figure 1 for Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Figure 2 for Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Figure 3 for Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Figure 4 for Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Abstract:Route planning is important in transportation. Existing works focus on finding the shortest path solution or using metrics such as safety and energy consumption to determine the planning. It is noted that most of these studies rely on prior knowledge of road network, which may be not available in certain situations. In this paper, we design a route planning algorithm based on deep reinforcement learning (DRL) for pedestrians. We use travel time consumption as the metric, and plan the route by predicting pedestrian flow in the road network. We put an agent, which is an intelligent robot, on a virtual map. Different from previous studies, our approach assumes that the agent does not need any prior information about road network, but simply relies on the interaction with the environment. We propose a dynamically adjustable route planning (DARP) algorithm, where the agent learns strategies through a dueling deep Q network to avoid congested roads. Simulation results show that the DARP algorithm saves 52% of the time under congestion condition when compared with traditional shortest path planning algorithms.

Via

Access Paper or Ask Questions