Extracting information from the electrocardiography (ECG) signal is an essential step in the design of digital health technologies in cardiology. In recent years, several machine learning (ML) algorithms for automatic extraction of information in ECG have been proposed. Supervised learning methods have successfully been used to identify specific aspects in the signal, like detection of rhythm disorders (arrhythmias). Self-supervised learning (SSL) methods, on the other hand, can be used to extract all the features contained in the data. The model is optimized without any specific goal and learns from the data itself. By adapting state-of-the-art computer vision methodologies to the signal processing domain, a few SSL approaches have been reported recently for ECG processing. However, such SSL methods require either data augmentation or negative pairs, which limits the method to only look for similarities between two ECG inputs, either two versions of the same signal or two signals from the same subject. This leads to models that are very effective at extracting characteristics that are stable in a subject, such as gender or age. But they are not successful at capturing changes within the ECG recording that can explain dynamic aspects, like different arrhythmias or different sleep stages. In this work, we introduce the first SSL method that uses neither data augmentation nor negative pairs for understanding ECG signals, and still, achieves comparable quality representations. As a result, it is possible to design a SSL method that not only captures similarities between two inputs, but also captures dissimilarities for a complete understanding of the data. In addition, a model based on transformer blocks is presented, which produces better results than a model based on convolutional layers (XResNet50) with almost the same number of parameters.