Abstract:Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems' full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies. Code available at: https://github.com/StanfordASL/RL4AMOD
Abstract:Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the inverse problem, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.
Abstract:Urban mobility is on the cusp of transformation with the emergence of shared, connected, and cooperative automated vehicles. Yet, for them to be accepted by customers, trust in their punctuality is vital. Many pilot initiatives operate without a fixed schedule, thus enhancing the importance of reliable arrival time (AT) predictions. This study presents an AT prediction system for autonomous shuttles, utilizing separate models for dwell and running time predictions, validated on real-world data from five cities. Alongside established methods such as XGBoost, we explore the benefits of integrating spatial data using graph neural networks (GNN). To accurately handle the case of a shuttle bypassing a stop, we propose a hierarchical model combining a random forest classifier and a GNN. The results for the final AT prediction are promising, showing low errors even when predicting several stops ahead. Yet, no single model emerges as universally superior, and we provide insights into the characteristics of pilot sites that influence the model selection process. Finally, we identify dwell time prediction as the key determinant in overall AT prediction accuracy when autonomous shuttles are deployed in low-traffic areas or under regulatory speed limits. This research provides insights into the current state of autonomous public transport prediction models and paves the way for more data-informed decision-making as the field advances.
Abstract:Autonomous Mobility-on-Demand (AMoD) systems are a rapidly evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. However, current RL-based approaches exclusively focus on learning from online data, fundamentally ignoring the per-sample-cost of interactions within real-world transportation systems. To address these limitations, we propose to formalize the control of AMoD systems through the lens of offline reinforcement learning and learn effective control strategies via solely offline data, thus readily available to current mobility operators. We further investigate design decisions and provide experiments on real-world mobility systems showing how offline learning allows to recover AMoD control policies that (i) exhibit performance on par with online methods, (ii) drastically improve data efficiency, and (iii) completely eliminate the need for complex simulated environments. Crucially, this paper demonstrates that offline reinforcement learning is a promising paradigm for the application of RL-based solutions within economically-critical systems, such as mobility systems.