Abstract:This paper presents a novel and effective deep reinforcement learning (DRL)-based approach to addressing joint resource management (JRM) in a practical multi-carrier non-orthogonal multiple access (MC-NOMA) system, where hardware sensitivity and imperfect successive interference cancellation (SIC) are considered. We first formulate the JRM problem to maximize the weighted-sum system throughput. Then, the JRM problem is decoupled into two iterative subtasks: subcarrier assignment (SA, including user grouping) and power allocation (PA). Each subtask is a sequential decision process. Invoking a deep deterministic policy gradient algorithm, our proposed DRL-based JRM (DRL-JRM) approach jointly performs the two subtasks, where the optimization objective and constraints of the subtasks are addressed by a new joint reward and internal reward mechanism. A multi-agent structure and a convolutional neural network are adopted to reduce the complexity of the PA subtask. We also tailor the neural network structure for the stability and convergence of DRL-JRM. Corroborated by extensive experiments, the proposed DRL-JRM scheme is superior to existing alternatives in terms of system throughput and resistance to interference, especially in the presence of many users and strong inter-cell interference. DRL-JRM can flexibly meet individual service requirements of users.
Abstract:Aiming to minimize service delay, we propose a new random caching scheme in device-to-device (D2D)-assisted heterogeneous network. To support diversified viewing qualities of multimedia video services, each video file is encoded into a base layer (BL) and multiple enhancement layers (ELs) by scalable video coding (SVC). A super layer, including the BL and several ELs, is transmitted to every user. We define and quantify the service delay of multi-quality videos by deriving successful transmission probabilities when a user is served by a D2D helper, a small-cell base station (SBS) and a macro-cell base station (MBS). We formulate a delay minimization problem subject to the limited cache sizes of D2D helpers and SBSs. The structure of the optimal solutions to the problem is revealed, and then an improved standard gradient projection method is designed to effectively obtain the solutions. Both theoretical analysis and Monte-Carlo simulations validate the successful transmission probabilities. Compared with three benchmark caching policies, the proposed SVC-based random caching scheme is superior in terms of reducing the service delay.