Electrocardiogram (ECG) has been widely used for emotion recognition. This paper presents a deep neural network based on convolutional layers and a transformer mechanism to detect stress using ECG signals. We perform leave-one-subject-out experiments on two publicly available datasets, WESAD and SWELL-KW, to evaluate our method. Our experiments show that the proposed model achieves strong results, comparable or better than the state-of-the-art models for ECG-based stress detection on these two datasets. Moreover, our method is end-to-end, does not require handcrafted features, and can learn robust representations with only a few convolutional blocks and the transformer component.