The development of optimal and efficient machine learning-based communication systems is likely to be a key enabler of beyond 5G communication technologies. In this direction, physical layer design has been recently reformulated under a deep learning framework where the autoencoder paradigm foresees the full communication system as an end-to-end coding-decoding problem. Given the loss function, the autoencoder jointly learns the coding and decoding optimal blocks under a certain channel model. Because performance in communications typically refers to achievable rates and channel capacity, the mutual information between channel input and output can be included in the end-to-end training process, thus, its estimation becomes essential. In this paper, we present a set of novel discriminative mutual information estimators and we discuss how to exploit them to design capacity-approaching codes and ultimately estimate the channel capacity.