Abstract:In the context of rail transit operations, real-time passenger flow prediction is essential; however, most models primarily focus on normal conditions, with limited research addressing incident situations. There are several intrinsic challenges associated with prediction during incidents, such as a lack of interpretability and data scarcity. To address these challenges, we propose a two-stage method that separates predictions under normal conditions and the causal effects of incidents. First, a normal prediction model is trained using data from normal situations. Next, the synthetic control method is employed to identify the causal effects of incidents, combined with placebo tests to determine significant levels of these effects. The significant effects are then utilized to train a causal effect prediction model, which can forecast the impact of incidents based on features of the incidents and passenger flows. During the prediction phase, the results from both the normal situation model and the causal effect prediction model are integrated to generate final passenger flow predictions during incidents. Our approach is validated using real-world data, demonstrating improved accuracy. Furthermore, the two-stage methodology enhances interpretability. By analyzing the causal effect prediction model, we can identify key influencing factors related to the effects of incidents and gain insights into their underlying mechanisms. Our work can assist subway system managers in estimating passenger flow affected by incidents and enable them to take proactive measures. Additionally, it can deepen researchers' understanding of the impact of incidents on subway passenger flows.
Abstract:Short-term traffic volume prediction is crucial for intelligent transportation system and there are many researches focusing on this field. However, most of these existing researches concentrated on refining model architecture and ignored amount of training data. Therefore, there remains a noticeable gap in thoroughly exploring the effect of augmented dataset, especially extensive historical data in training. In this research, two datasets containing taxi and bike usage spanning over eight years in New York were used to test such effects. Experiments were conducted to assess the precision of models trained with data in the most recent 12, 24, 48, and 96 months. It was found that the training set encompassing 96 months, at times, resulted in diminished accuracy, which might be owing to disparities between historical traffic patterns and present ones. An analysis was subsequently undertaken to discern potential sources of inconsistent patterns, which may include both covariate shift and concept shift. To address these shifts, we proposed an innovative approach that aligns covariate distributions using a weighting scheme to manage covariate shift, coupled with an environment aware learning method to tackle the concept shift. Experiments based on real word datasets demonstrate the effectiveness of our method which can significantly decrease testing errors and ensure an improvement in accuracy when training with large-scale historical data. As far as we know, this work is the first attempt to assess the impact of contiguously expanding training dataset on the accuracy of traffic prediction models. Besides, our training method is able to be incorporated into most existing short-term traffic prediction models and make them more suitable for long term historical training dataset.