University of A Coruña
Abstract:The combination of multiple-input multiple-output (MIMO) systems and intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system. Both approaches aim to maximize the system sum-rate for every channel realization. The first proposed solution is a novel contextual bandit (CB) framework with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions, and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is appropriately met. Both proposals perform remarkably better than state-of-the-art heuristic methods in scenarios with high multi-user interference.
Abstract:This work focuses on wideband intelligent reflecting surface (IRS)-aided multiuser MIMO systems. One of the major challenges of this scenario is the joint design of the frequency-dependent base station (BS) precoder and user filters, and the IRS phase-shift matrix which is frequency flat and common to all the users. In addition, we consider that the channel state information (CSI) is imperfect at both the transmitter and the receivers. A statistical model for the imperfect CSI is developed and exploited for the system design. A minimum mean square error (MMSE) approach is followed to determine the IRS phase-shift matrix, the transmit precoders, and the receiving filters. The broadcast (BC)- multiple access channel (MAC) duality is used to solve the optimization problem following an alternating minimization approach. Numerical results show that the proposed approach leads to substantial performance gains with respect to baseline strategies that neglect the inter-user interference and do not optimize the IRS phase-shift matrix. Further performance gains are obtained when incorporating into the system design the statistical information of the channel estimation errors.