Abstract:Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, much of the spatial reasoning in these tasks occurs in two-dimensional (2D) environments, and our evaluation reveals that state-of-the-art VLMs frequently generate implausible and incorrect responses to composite spatial reasoning problems, including simple pathfinding tasks that humans can solve effortlessly at a glance. To address this, we explore an effective approach to enhance 2D spatial reasoning within VLMs by training the model on basic spatial capabilities. We begin by disentangling the key components of 2D spatial reasoning: direction comprehension, distance estimation, and localization. Our central hypothesis is that mastering these basic spatial capabilities can significantly enhance a model's performance on composite spatial tasks requiring advanced spatial understanding and combinatorial problem-solving. To investigate this hypothesis, we introduce Sparkle, a framework that fine-tunes VLMs on these three basic spatial capabilities by synthetic data generation and targeted supervision to form an instruction dataset for each capability. Our experiments demonstrate that VLMs fine-tuned with Sparkle achieve significant performance gains, not only in the basic tasks themselves but also in generalizing to composite and out-of-distribution spatial reasoning tasks (e.g., improving from 13.5% to 40.0% on the shortest path problem). These findings underscore the effectiveness of mastering basic spatial capabilities in enhancing composite spatial problem-solving, offering insights for improving VLMs' spatial reasoning capabilities.
Abstract:Graph Neural Networks deliver strong classification results but often suffer from poor calibration performance, leading to overconfidence or underconfidence. This is particularly problematic in high stakes applications where accurate uncertainty estimates are essential. Existing post hoc methods, such as temperature scaling, fail to effectively utilize graph structures, while current GNN calibration methods often overlook the potential of leveraging diverse input information and model ensembles jointly. In the paper, we propose Graph Ensemble Temperature Scaling, a novel calibration framework that combines input and model ensemble strategies within a Graph Mixture of Experts archi SOTA calibration techniques, reducing expected calibration error by 25 percent across 10 GNN benchmark datasets. Additionally, GETS is computationally efficient, scalable, and capable of selecting effective input combinations for improved calibration performance.
Abstract:Quantifying uncertainty is crucial for robust and reliable predictions. However, existing spatiotemporal deep learning mostly focuses on deterministic prediction, overlooking the inherent uncertainty in such prediction. Particularly, highly-granular spatiotemporal datasets are often sparse, posing extra challenges in prediction and uncertainty quantification. To address these issues, this paper introduces a novel post-hoc Sparsity-awar Uncertainty Calibration (SAUC) framework, which calibrates uncertainty in both zero and non-zero values. To develop SAUC, we firstly modify the state-of-the-art deterministic spatiotemporal Graph Neural Networks (ST-GNNs) to probabilistic ones in the pre-calibration phase. Then we calibrate the probabilistic ST-GNNs for zero and non-zero values using quantile approaches.Through extensive experiments, we demonstrate that SAUC can effectively fit the variance of sparse data and generalize across two real-world spatiotemporal datasets at various granularities. Specifically, our empirical experiments show a 20\% reduction in calibration errors in zero entries on the sparse traffic accident and urban crime prediction. Overall, this work demonstrates the theoretical and empirical values of the SAUC framework, thus bridging a significant gap between uncertainty quantification and spatiotemporal prediction.
Abstract:Transportation mode share analysis is important to various real-world transportation tasks as it helps researchers understand the travel behaviors and choices of passengers. A typical example is the prediction of communities' travel mode share by accounting for their sociodemographics like age, income, etc., and travel modes' attributes (e.g. travel cost and time). However, there exist only limited efforts in integrating the structure of the urban built environment, e.g., road networks, into the mode share models to capture the impacts of the built environment. This task usually requires manual feature engineering or prior knowledge of the urban design features. In this study, we propose deep hybrid models (DHM), which directly combine road networks and sociodemographic features as inputs for travel mode share analysis. Using graph embedding (GE) techniques, we enhance travel demand models with a more powerful representation of urban structures. In experiments of mode share prediction in Chicago, results demonstrate that DHM can provide valuable spatial insights into the sociodemographic structure, improving the performance of travel demand models in estimating different mode shares at the city level. Specifically, DHM improves the results by more than 20\% while retaining the interpretation power of the choice models, demonstrating its superiority in interpretability, prediction accuracy, and geographical insights.
Abstract:In this paper, we for the first time propose the task of Open-domain Urban Itinerary Planning (OUIP) for citywalk, which directly generates itineraries based on users' requests described in natural language. OUIP is different from conventional itinerary planning, which limits users from expressing more detailed needs and hinders true personalization. Recently, large language models (LLMs) have shown potential in handling diverse tasks. However, due to non-real-time information, incomplete knowledge, and insufficient spatial awareness, they are unable to independently deliver a satisfactory user experience in OUIP. Given this, we present ItiNera, an OUIP system that synergizes spatial optimization with Large Language Models (LLMs) to provide services that customize urban itineraries based on users' needs. Specifically, we develop an LLM-based pipeline for extracting and updating POI features to create a user-owned personalized POI database. For each user request, we leverage LLM in cooperation with an embedding-based module for retrieving candidate POIs from the user's POI database. Then, a spatial optimization module is used to order these POIs, followed by LLM crafting a personalized, spatially coherent itinerary. To the best of our knowledge, this study marks the first integration of LLMs to innovate itinerary planning solutions. Extensive experiments on offline datasets and online subjective evaluation have demonstrated the capacities of our system to deliver more responsive and spatially coherent itineraries than current LLM-based solutions. Our system has been deployed in production at the TuTu online travel service and has attracted thousands of users for their urban travel planning.
Abstract:We introduce the BL model and the Perspective Matrix to optimize supplier selection and order allocation, focusing on both temporal and spatial dynamics. Our development of a Supplier Relationship Network, using a Spatio-Temporal Graph Neural Network, enhances the understanding of complex supplier interdependencies. Additionally, we address credibility issues in zero-order scenarios with a Masked Ranking Mechanism, improving supplier ranking efficiency. Our model demonstrates superior results on two datasets compared to the traditional models. Our evaluations using real-world datasets highlight DBLM's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
Abstract:The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.
Abstract:Travel behavior prediction is a fundamental task in transportation demand management. The conventional methods for travel behavior prediction rely on numerical data to construct mathematical models and calibrate model parameters to represent human preferences. Recent advancement in large language models (LLMs) has shown great reasoning abilities to solve complex problems. In this study, we propose to use LLMs to predict travel behavior with prompt engineering without data-based parameter learning. Specifically, we carefully design our prompts that include 1) task description, 2) travel characteristics, 3) individual attributes, and 4) guides of thinking with domain knowledge, and ask the LLMs to predict an individual's travel behavior and explain the results. We select the travel mode choice task as a case study. Results show that, though no training samples are provided, LLM-based predictions have competitive accuracy and F1-score as canonical supervised learning methods such as multinomial logit, random forest, and neural networks. LLMs can also output reasons that support their prediction. However, though in most of the cases, the output explanations are reasonable, we still observe cases that violate logic or with hallucinations.
Abstract:Predicting traffic incident risks at granular spatiotemporal levels is challenging. The datasets predominantly feature zero values, indicating no incidents, with sporadic high-risk values for severe incidents. Notably, a majority of current models, especially deep learning methods, focus solely on estimating risk values, overlooking the uncertainties arising from the inherently unpredictable nature of incidents. To tackle this challenge, we introduce the Spatiotemporal Zero-Inflated Tweedie Graph Neural Networks (STZITD-GNNs). Our model merges the reliability of traditional statistical models with the flexibility of graph neural networks, aiming to precisely quantify uncertainties associated with road-level traffic incident risks. This model strategically employs a compound model from the Tweedie family, as a Poisson distribution to model risk frequency and a Gamma distribution to account for incident severity. Furthermore, a zero-inflated component helps to identify the non-incident risk scenarios. As a result, the STZITD-GNNs effectively capture the dataset's skewed distribution, placing emphasis on infrequent but impactful severe incidents. Empirical tests using real-world traffic data from London, UK, demonstrate that our model excels beyond current benchmarks. The forte of STZITD-GNN resides not only in its accuracy but also in its adeptness at curtailing uncertainties, delivering robust predictions over short (7 days) and extended (14 days) timeframes.
Abstract:crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.