Abstract:Real-time demand prediction is a critical input for dynamic bus routing. While many researchers have developed numerous complex methods to predict short-term transit demand, the applications have been limited to short, stable time frames and a few stations. How these methods perform in highly dynamic environments has not been studied, nor has their performance been systematically compared. We built an open-source infrastructure with five common methodologies, including econometric and deep learning approaches, and assessed their performance under stable and highly dynamic conditions. We used a time series from smartcard data to predict demand for the following day for the BRT system in Bogota, Colombia. The dynamic conditions in the time series include a month-long protest and the COVID-19 pandemic. Both conditions triggered drastic shifts in demand. The results reveal that most tested models perform similarly in stable conditions, with MAAPE varying from 0.08 to 0.12. The benchmark demonstrated that all models performed significantly worse in both dynamic conditions compared to the stable conditions. In the month-long protest, the increased MAAPE ranged from 0.14 to 0.24. Similarly, during the COVID-19 pandemic, the increased MAAPE ranged from 0.12 to 0.82. Notably, in the COVID-19 pandemic condition, an LSTM model with adaptive training and a multi-output design outperformed other models, adapting faster to disruptions. The prediction error stabilized within approximately 1.5 months, whereas other models continued to exhibit higher error rates even a year after the start of the pandemic. The aim of this open-source codebase infrastructure is to lower the barrier for other researchers to replicate and reproduce models, facilitate a collective effort within the research community to improve the benchmarking process and accelerate the advancement of short-term ridership prediction models.
Abstract:In the last decade, the digital age has sharply redefined the way we study human behavior. With the advancement of data storage and sensing technologies, electronic records now encompass a diverse spectrum of human activity, ranging from location data, phone and email communication to Twitter activity and open-source contributions on Wikipedia and OpenStreetMap. In particular, the study of the shopping and mobility patterns of individual consumers has the potential to give deeper insight into the lifestyles and infrastructure of the region. Credit card records (CCRs) provide detailed insight into purchase behavior and have been found to have inherent regularity in consumer shopping patterns; call detail records (CDRs) present new opportunities to understand human mobility, analyze wealth, and model social network dynamics. In this chapter, we jointly model the lifestyles of individuals, a more challenging problem with higher variability when compared to the aggregated behavior of city regions. Using collective matrix factorization, we propose a unified dual view of lifestyles. Understanding these lifestyles will not only inform commercial opportunities, but also help policymakers and nonprofit organizations understand the characteristics and needs of the entire region, as well as of the individuals within that region. The applications of this range from targeted advertisements and promotions to the diffusion of digital financial services among low-income groups.
Abstract:This paper presents an example of how demographical characteristics of patients influence their susceptibility to certain medical conditions. In this paper, we investigate the association of health conditions to age of patients in a heterogeneous population. We show that besides the symptoms a patients is having, the age has the potential of aiding the diagnostic process in hospitals. Working with Electronic Health Records (EHR), we show that medical conditions group into clusters that share distinctive population age densities. We use Electronic Health Records from Brazil for a period of 15 months from March of 2013 to July of 2014. The number of patients in the data is 1.7 million patients and the number of records is 47 million records. The findings has the potential of helping in a setting where an automated system undergoes the task of predicting the condition of a patient given their symptoms and demographical information.