We introduce learned attention models into the radio machine learning domain for the task of modulation recognition by leveraging spatial transformer networks and introducing new radio domain appropriate transformations. This attention model allows the network to learn a localization network capable of synchronizing and normalizing a radio signal blindly with zero knowledge of the signals structure based on optimization of the network for classification accuracy, sparse representation, and regularization. Using this architecture we are able to outperform our prior results in accuracy vs signal to noise ratio against an identical system without attention, however we believe such an attention model has implication far beyond the task of modulation recognition.