Many electroencephalography (EEG) applications rely on channel selection methods to remove the least informative channels, e.g., to reduce the amount of electrodes to be mounted, to decrease the computational load, or to reduce overfitting effects and improve performance. Wrapper-based channel selection methods aim to match the channel selection step to the target model, yet they require to re-train the model multiple times on different candidate channel subsets, which often leads to an unacceptably high computational cost, especially when said model is a (deep) neural network. To alleviate this, we propose a framework to embed the EEG channel selection in the neural network itself to jointly learn the network weights and optimal channels in an end-to-end manner by traditional backpropagation algorithms. We deal with the discrete nature of this new optimization problem by employing continuous relaxations of the discrete channel selection parameters based on the Gumbel-softmax trick. We also propose a regularization method that discourages selecting channels more than once. This generic approach is evaluated on two different EEG tasks: motor imagery brain-computer interfaces and auditory attention decoding. The results demonstrate that our framework is generally applicable, while being competitive with state-of-the art EEG channel selection methods, tailored to these tasks.