The autoencoder concept has fostered the reinterpretation and the design of modern communication systems. It consists of an encoder, a channel, and a decoder block which modify their internal neural structure in an end-to-end learning fashion. However, the current approach to train an autoencoder relies on the use of the cross-entropy loss function. This approach can be prone to overfitting issues and often fails to learn an optimal system and signal representation (code). In addition, less is known about the autoencoder ability to design channel capacity-approaching codes, i.e., codes that maximize the input-output information under a certain power constraint. The task being even more formidable for an unknown channel for which the capacity is unknown and therefore it has to be learnt. In this paper, we address the challenge of designing capacity-approaching codes by incorporating the presence of the communication channel into a novel loss function for the autoencoder training. In particular, we exploit the mutual information between the transmitted and received signals as a regularization term in the cross-entropy loss function, with the aim of controlling the amount of information stored. By jointly maximizing the mutual information and minimizing the cross-entropy, we propose a methodology that a) computes an estimate of the channel capacity and b) constructs an optimal coded signal approaching it. Several simulation results offer evidence of the potentiality of the proposed method.