Abstract:In this paper, the problem of pilot contamination in a multi-cell massive multiple input multiple output (M-MIMO) system is addressed using deep reinforcement learning (DRL). To this end, a pilot assignment strategy is designed that adapts to the channel variations while maintaining a tolerable pilot contamination effect. Using the angle of arrival (AoA) information of the users, a cost function, portraying the reward, is presented, defining the pilot contamination effects in the system. Numerical results illustrate that the DRL-based scheme is able to track the changes in the environment, learn the near-optimal pilot assignment, and achieve a close performance to that of the optimum pilot assignment performed by exhaustive search, while maintaining a low computational complexity.