Abstract:Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, much of the spatial reasoning in these tasks occurs in two-dimensional (2D) environments, and our evaluation reveals that state-of-the-art VLMs frequently generate implausible and incorrect responses to composite spatial reasoning problems, including simple pathfinding tasks that humans can solve effortlessly at a glance. To address this, we explore an effective approach to enhance 2D spatial reasoning within VLMs by training the model on basic spatial capabilities. We begin by disentangling the key components of 2D spatial reasoning: direction comprehension, distance estimation, and localization. Our central hypothesis is that mastering these basic spatial capabilities can significantly enhance a model's performance on composite spatial tasks requiring advanced spatial understanding and combinatorial problem-solving. To investigate this hypothesis, we introduce Sparkle, a framework that fine-tunes VLMs on these three basic spatial capabilities by synthetic data generation and targeted supervision to form an instruction dataset for each capability. Our experiments demonstrate that VLMs fine-tuned with Sparkle achieve significant performance gains, not only in the basic tasks themselves but also in generalizing to composite and out-of-distribution spatial reasoning tasks (e.g., improving from 13.5% to 40.0% on the shortest path problem). These findings underscore the effectiveness of mastering basic spatial capabilities in enhancing composite spatial problem-solving, offering insights for improving VLMs' spatial reasoning capabilities.
Abstract:Graph Neural Networks deliver strong classification results but often suffer from poor calibration performance, leading to overconfidence or underconfidence. This is particularly problematic in high stakes applications where accurate uncertainty estimates are essential. Existing post hoc methods, such as temperature scaling, fail to effectively utilize graph structures, while current GNN calibration methods often overlook the potential of leveraging diverse input information and model ensembles jointly. In the paper, we propose Graph Ensemble Temperature Scaling, a novel calibration framework that combines input and model ensemble strategies within a Graph Mixture of Experts archi SOTA calibration techniques, reducing expected calibration error by 25 percent across 10 GNN benchmark datasets. Additionally, GETS is computationally efficient, scalable, and capable of selecting effective input combinations for improved calibration performance.
Abstract:Transportation mode share analysis is important to various real-world transportation tasks as it helps researchers understand the travel behaviors and choices of passengers. A typical example is the prediction of communities' travel mode share by accounting for their sociodemographics like age, income, etc., and travel modes' attributes (e.g. travel cost and time). However, there exist only limited efforts in integrating the structure of the urban built environment, e.g., road networks, into the mode share models to capture the impacts of the built environment. This task usually requires manual feature engineering or prior knowledge of the urban design features. In this study, we propose deep hybrid models (DHM), which directly combine road networks and sociodemographic features as inputs for travel mode share analysis. Using graph embedding (GE) techniques, we enhance travel demand models with a more powerful representation of urban structures. In experiments of mode share prediction in Chicago, results demonstrate that DHM can provide valuable spatial insights into the sociodemographic structure, improving the performance of travel demand models in estimating different mode shares at the city level. Specifically, DHM improves the results by more than 20\% while retaining the interpretation power of the choice models, demonstrating its superiority in interpretability, prediction accuracy, and geographical insights.
Abstract:The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.
Abstract:Short-term demand forecasting for on-demand ride-hailing services is one of the fundamental issues in intelligent transportation systems. However, previous travel demand forecasting research predominantly focused on improving prediction accuracy, ignoring fairness issues such as systematic underestimations of travel demand in disadvantaged neighborhoods. This study investigates how to measure, evaluate, and enhance prediction fairness between disadvantaged and privileged communities in spatial-temporal demand forecasting of ride-hailing services. A two-pronged approach is taken to reduce the demand prediction bias. First, we develop a novel deep learning model architecture, named socially aware neural network (SA-Net), to integrate the socio-demographics and ridership information for fair demand prediction through an innovative socially-aware convolution operation. Second, we propose a bias-mitigation regularization method to mitigate the mean percentage prediction error gap between different groups. The experimental results, validated on the real-world Chicago Transportation Network Company (TNC) data, show that the de-biasing SA-Net can achieve better predictive performance in both prediction accuracy and fairness. Specifically, the SA-Net improves prediction accuracy for both the disadvantaged and privileged groups compared with the state-of-the-art models. When coupled with the bias mitigation regularization method, the de-biasing SA-Net effectively bridges the mean percentage prediction error gap between the disadvantaged and privileged groups, and also protects the disadvantaged regions against systematic underestimation of TNC demand. Our proposed de-biasing method can be adopted in many existing short-term travel demand estimation models, and can be utilized for various other spatial-temporal prediction tasks such as crime incidents predictions.
Abstract:Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid models with a crossing structure consisting of a mixing operator and a behavioral predictor, thus integrating the numeric and imagery data into a latent space. Empirically, this framework is applied to analyze travel mode choice using the MyDailyTravel Survey from Chicago as the numeric inputs and the satellite images as the imagery inputs. We found that deep hybrid models outperform both the traditional demand models and the recent deep learning in predicting the aggregate and disaggregate travel behavior with our supervision-as-mixing design. The latent space in deep hybrid models can be interpreted, because it reveals meaningful spatial and social patterns. The deep hybrid models can also generate new urban images that do not exist in reality and interpret them with economic theory, such as computing substitution patterns and social welfare changes. Overall, the deep hybrid models demonstrate the complementarity between the low-dimensional numeric and high-dimensional imagery data and between the traditional demand modeling and recent deep learning. It generalizes the latent classes and variables in classical hybrid demand models to a latent space, and leverages the computational power of deep learning for imagery while retaining the economic interpretability on the microeconomics foundation.
Abstract:Although researchers increasingly adopt machine learning to model travel behavior, they predominantly focus on prediction accuracy, ignoring the ethical challenges embedded in machine learning algorithms. This study introduces an important missing dimension - computational fairness - to travel behavior analysis. We first operationalize computational fairness by equality of opportunity, then differentiate between the bias inherent in data and the bias introduced by modeling. We then demonstrate the prediction disparities in travel behavior modeling using the 2017 National Household Travel Survey (NHTS) and the 2018-2019 My Daily Travel Survey in Chicago. Empirically, deep neural network (DNN) and discrete choice models (DCM) reveal consistent prediction disparities across multiple social groups: both over-predict the false negative rate of frequent driving for the ethnic minorities, the low-income and the disabled populations, and falsely predict a higher travel burden of the socially disadvantaged groups and the rural populations than reality. Comparing DNN with DCM, we find that DNN can outperform DCM in prediction disparities because of DNN's smaller misspecification error. To mitigate prediction disparities, this study introduces an absolute correlation regularization method, which is evaluated with synthetic and real-world data. The results demonstrate the prevalence of prediction disparities in travel behavior modeling, and the disparities still persist regarding a variety of model specifics such as the number of DNN layers, batch size and weight initialization. Since these prediction disparities can exacerbate social inequity if prediction results without fairness adjustment are used for transportation policy making, we advocate for careful consideration of the fairness problem in travel behavior modeling, and the use of bias mitigation algorithms for fair transport decisions.