Motivated by the growing interest in integrated sensing and communication for 6th generation (6G) networks, this paper presents a cognitive Multiple-Input Multiple-Output (MIMO) radar system enhanced by reinforcement learning (RL) for robust multitarget detection in dynamic environments. The system employs a planar array configuration and adapts its transmitted waveforms and beamforming patterns to optimize detection performance in the presence of unknown two-dimensional (2D) disturbances. A robust Wald-type detector is integrated with a SARSA-based RL algorithm, enabling the radar to learn and adapt to complex clutter environments modeled by a 2D autoregressive process. Simulation results demonstrate significant improvements in detection probability compared to omnidirectional methods, particularly for low Signal-to-Noise Ratio (SNR) targets masked by clutter.