EEG-based emotion recognition holds significant potential in the field of brain-computer interfaces. A key challenge lies in extracting discriminative spatiotemporal features from electroencephalogram (EEG) signals. Existing studies often rely on domain-specific time-frequency features and analyze temporal dependencies and spatial characteristics separately, neglecting the interaction between local-global relationships and spatiotemporal dynamics. To address this, we propose a novel network called Multi-Scale Inverted Mamba (MS-iMamba), which consists of Multi-Scale Temporal Blocks (MSTB) and Temporal-Spatial Fusion Blocks (TSFB). Specifically, MSTBs are designed to capture both local details and global temporal dependencies across different scale subsequences. The TSFBs, implemented with an inverted Mamba structure, focus on the interaction between dynamic temporal dependencies and spatial characteristics. The primary advantage of MS-iMamba lies in its ability to leverage reconstructed multi-scale EEG sequences, exploiting the interaction between temporal and spatial features without the need for domain-specific time-frequency feature extraction. Experimental results on the DEAP, DREAMER, and SEED datasets demonstrate that MS-iMamba achieves classification accuracies of 94.86%, 94.94%, and 91.36%, respectively, using only four-channel EEG signals, outperforming state-of-the-art methods.