Accurate passenger flow prediction of urban rail transit is essential for improving the performance of intelligent transportation systems, especially during the epidemic. How to dynamically model the complex spatiotemporal dependencies of passenger flow is the main issue in achieving accurate passenger flow prediction during the epidemic. To solve this issue, this paper proposes a brand-new transformer-based architecture called STformer under the encoder-decoder framework specifically for COVID-19. Concretely, we develop a modified self-attention mechanism named Causal-Convolution ProbSparse Self-Attention (CPSA) to model the multiple temporal dependencies of passenger flow with low computational costs. To capture the complex and dynamic spatial dependencies, we introduce a novel Adaptive Multi-Graph Convolution Network (AMGCN) by leveraging multiple graphs in a self-adaptive manner. Additionally, the Multi-source Data Fusion block fuses the passenger flow data, COVID-19 confirmed case data, and the relevant social media data to study the impact of COVID-19 to passenger flow. Experiments on real-world passenger flow datasets demonstrate the superiority of ST-former over the other eleven state-of-the-art methods. Several ablation studies are carried out to verify the effectiveness and reliability of our model structure. Results can provide critical insights for the operation of URT systems.