There is a great need to accurately predict short-term precipitation, which has socioeconomic effects such as agriculture and disaster prevention. Recently, the forecasting models have employed multi-source data as the multi-modality input, thus improving the prediction accuracy. However, the prevailing methods usually suffer from the desynchronization of multi-source variables, the insufficient capability of capturing spatio-temporal dependency, and unsatisfactory performance in predicting extreme precipitation events. To fix these problems, we propose a short-term precipitation forecasting model based on spatio-temporal alignment attention, with SATA as the temporal alignment module and STAU as the spatio-temporal feature extractor to filter high-pass features from precipitation signals and capture multi-term temporal dependencies. Based on satellite and ERA5 data from the southwestern region of China, our model achieves improvements of 12.61\% in terms of RMSE, in comparison with the state-of-the-art methods.