Electroencephalogram (EEG) can objectively reflect emotional state and changes. However, the transmission mechanism of EEG in the brain and its internal relationship with emotion are still ambiguous to human beings. This paper presents a novel approach to EEG emotion recognition built exclusively on self-attention over the spectrum, space, and time dimensions to explore the contribution of different EEG electrodes and temporal slices to specific emotional states. Our method, named EEG emotion Transformer (EeT), adapts the conventional Transformer architecture to EEG signals by enabling spatiospectral feature learning directly from the sequences of EEG signals. Our experimental results demonstrate that "joint attention" where temporal and spatial attention are applied simultaneously within each block, leads to the best emotion recognition accuracy among the design choices. In addition, compared with other competitive methods, the proposed method achieves state-of-art results on SEED and SEED-IV datasets.