The meteorological radar reflectivity data, also known as echo, plays a crucial role in predicting precipitation and enabling accurate and fast forecasting of short-term heavy rainfall without the need for complex Numerical Weather Prediction (NWP) model. Compared to conventional model, Deep Learning (DL)-based radar echo extrapolation algorithms are more effective and efficient. However, the development of highly reliable and generalized algorithms is hindered by three main bottlenecks: cumulative error spreading, imprecise representation of sparse echo distribution, and inaccurate description of non-stationary motion process. To address these issues, this paper presents a novel radar echo extrapolation algorithm that utilizes temporal-spatial correlation features and the Transformer technology. The algorithm extracts features from multi-frame echo images that accurately represent non-stationary motion processes for precipitation prediction. The proposed algorithm uses a novel parallel encoder based on Transformer technology to effectively and automatically extract echoes' temporal-spatial features. Furthermore, a Multi-level Temporal-Spatial attention mechanism is adopted to enhance the ability to perceive global-local information and highlight the task-related feature regions in a lightweight way. The proposed method's effectiveness has been valided on the classic radar echo extrapolation task using the real-world dataset. Numerous experiments have further demonstrated the effectiveness and necessity of various components of the proposed method.