We propose an over-the-air digital predistortion optimization algorithm using reinforcement learning. Based on a symbol-based criterion, the algorithm minimizes the errors between downsampled messages at the receiver side. The algorithm does not require any knowledge about the underlying hardware or channel. For a generalized memory polynomial power amplifier and additive white Gaussian noise channel, we show that the proposed algorithm achieves performance improvements in terms of symbol error rate compared with an indirect learning architecture even when the latter is coupled with a full sampling rate ADC in the feedback path. Furthermore, it maintains a satisfactory adjacent channel power ratio.