Abstract:We study jamming of an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning with a contextual bandit to jam a 5G-based system implementing some aspects of the 5G protocol. This model introduces unreliable reward feedback in the form of ACK/NACK observations to the jammer to understand the effect of how imperfect observations of errors can affect the jammer's ability to learn. We gain insights into the convergence time of the jammer and its ability to jam a victim 5G waveform, as well as insights into the vulnerabilities of wireless communications for reinforcement learning-based jamming.