Abstract:Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a signiffcant challenge on Multiaccess Edge Computing (MEC) systems, due to the limited multidimensional resources. To address this challenge, we propose a collaborative MEC framework that facilitates resource sharing between the edge servers, and with the aim to maximize the long-term QoS and reduce the cache switching cost through joint optimization of service caching, collaborative offfoading, and computation and communication resource allocation. The dual timescale feature and temporal recurrence relationship between service caching and other resource allocation make solving the problem even more challenging. To solve it, we propose a deep reinforcement learning (DRL)-based dual timescale scheme, called DGL-DDPG, which is composed of a short-term genetic algorithm (GA) and a long short-term memory network-based deep deterministic policy gradient (LSTM-DDPG). In doing so, we reformulate the optimization problem as a Markov decision process (MDP) where the small-timescale resource allocation decisions generated by an improved GA are taken as the states and input into a centralized LSTM-DDPG agent to generate the service caching decision for the large-timescale. Simulation results demonstrate that our proposed algorithm outperforms the baseline algorithms in terms of the average QoS and cache switching cost.
Abstract:Deep reinforcement learning (DRL) has a great potential for solving complex decision-making problems in autonomous driving, especially in mixed-traffic scenarios where autonomous vehicles and human-driven vehicles (HDVs) drive together. Safety is a key during both the learning and deploying reinforcement learning (RL) algorithms process. In this paper, we formulate the on-ramp merging as a Markov Decision Process (MDP) problem and solve it with an off-policy RL algorithm, i.e., Soft Actor-Critic for Discrete Action Settings (SAC-Discrete). In addition, a motion predictive safety controller including a motion predictor and an action substitution module, is proposed to ensure driving safety during both training and testing. The motion predictor estimates the trajectories of the ego vehicle and surrounding vehicles from kinematic models, and predicts potential collisions. The action substitution module updates the actions based on safety distance and replaces risky actions, before sending them to the low-level controller. We train, evaluate and test our approach on a gym-like highway simulation with three different levels of traffic modes. The simulation results show that even in harder traffic densities, our proposed method still significantly reduces collision rate while maintaining high efficiency, outperforming several state-of-the-art baselines in the considered on-ramp merging scenarios. The video demo of the evaluation process can be found at: https://www.youtube.com/watch?v=7FvjbAM4oFw