Abstract:This paper considers the problem of multi-target detection for massive multiple input multiple output (MMIMO) cognitive radar (CR). The concept of CR is based on the perception-action cycle that senses and intelligently adapts to the dynamic environment in order to optimally satisfy a specific mission. However, this usually requires a priori knowledge of the environmental model, which is not available in most cases. We propose a reinforcement learning (RL) based algorithm for cognitive beamforming in the presence of unknown disturbance statistics. The radar acts as an agent which continuously senses the unknown environment (i.e., targets and disturbance). Consequently, it optimizes the beamformers through tailoring the beampattern based on the acquired information. Furthermore, we propose a solution to the beamforming optimization problem with less complexity than the existing methods. Numerical simulations are performed to assess the performance of the proposed RL-based algorithm in both stationary and dynamic environments. The RL based beamforming is compared to the conventional omnidirectional approach with equal power allocation. As highlighted by the proposed numerical results, our RL-based beamformer greatly outperforms the omnidirectional one in terms of target detection performance. The performance improvement is even more remarkable under environmentally harsh conditions such as low SNR, heavy-tailed disturbance and rapidly changing scenarios.