Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexandre Bayen

Reevaluating Policy Gradient Methods for Imperfect-Information Games

Feb 13, 2025

Max Rudolph, Nathan Lichtle, Sobhan Mohammadpour, Alexandre Bayen, J. Zico Kolter, Amy Zhang, Gabriele Farina, Eugene Vinitsky, Samuel Sokota

Abstract:In the past decade, motivated by the putative failure of naive self-play deep reinforcement learning (DRL) in adversarial imperfect-information games, researchers have developed numerous DRL algorithms based on fictitious play (FP), double oracle (DO), and counterfactual regret minimization (CFR). In light of recent results of the magnetic mirror descent algorithm, we hypothesize that simpler generic policy gradient methods like PPO are competitive with or superior to these FP, DO, and CFR-based DRL approaches. To facilitate the resolution of this hypothesis, we implement and release the first broadly accessible exact exploitability computations for four large games. Using these games, we conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games. Over 5600 training runs, FP, DO, and CFR-based approaches fail to outperform generic policy gradient methods. Code is available at https://github.com/nathanlct/IIG-RL-Benchmark and https://github.com/gabrfarina/exp-a-spiel .

Via

Access Paper or Ask Questions

Scalable Learning of Segment-Level Traffic Congestion Functions

May 09, 2024

Shushman Choudhury, Abdul Rahman Kreidieh, Iveel Tsogsuren, Neha Arora, Carolina Osorio, Alexandre Bayen

Abstract:We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of macroscopic traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data from all segments into one dataset, combining static attributes with dynamic time-dependent features. Second, we train a feed-forward neural network on this dataset, which we can then use on any segment in the area. We evaluate how well our framework identifies congestion functions on observed segments and how it generalizes to unobserved segments and predicts segment attributes on a large dataset covering multiple cities worldwide. For identification error on observed segments, our single data-driven congestion function compares favorably to segment-specific model-based functions on highway roads, but has room to improve on arterial roads. For generalization, our approach shows strong performance across cities and road types: both on unobserved segments in the same city and on zero-shot transfer learning between cities. Finally, for predicting segment attributes, we find that our approach can approximate critical densities for individual segments using their static properties.

* Submitted to IEEE ITSC 2024

Via

Access Paper or Ask Questions

Enabling Mixed Autonomy Traffic Control

Oct 28, 2023

Matthew Nice, Matt Bunting, Alex Richardson, Gergely Zachar, Jonathan W. Lee, Alexandre Bayen, Maria Laura Delle Monache, Benjamin Seibold, Benedetto Piccoli, Jonathan Sprinkle(+1 more)

Figure 1 for Enabling Mixed Autonomy Traffic Control

Figure 2 for Enabling Mixed Autonomy Traffic Control

Figure 3 for Enabling Mixed Autonomy Traffic Control

Figure 4 for Enabling Mixed Autonomy Traffic Control

Abstract:We demonstrate a new capability of automated vehicles: mixed autonomy traffic control. With this new capability, automated vehicles can shape the traffic flows composed of other non-automated vehicles, which has the promise to improve safety, efficiency, and energy outcomes in transportation systems at a societal scale. Investigating mixed autonomy mobile traffic control must be done in situ given that the complex dynamics of other drivers and their response to a team of automated vehicles cannot be effectively modeled. This capability has been blocked because there is no existing scalable and affordable platform for experimental control. This paper introduces an extensible open-source hardware and software platform, enabling a team of 100 vehicles to execute several different vehicular control algorithms as a collaborative fleet, composed of three different makes and models, which drove 22752 miles in a combined 1022 hours, over 5 days in Nashville, TN in November 2022.

Via

Access Paper or Ask Questions

So you think you can track?

Sep 13, 2023

Derek Gloudemans, Gergely Zachár, Yanbing Wang, Junyi Ji, Matt Nice, Matt Bunting, William Barbour, Jonathan Sprinkle, Benedetto Piccoli, Maria Laura Delle Monache(+3 more)

Figure 1 for So you think you can track?

Figure 2 for So you think you can track?

Figure 3 for So you think you can track?

Figure 4 for So you think you can track?

Abstract:This work introduces a multi-camera tracking dataset consisting of 234 hours of video data recorded concurrently from 234 overlapping HD cameras covering a 4.2 mile stretch of 8-10 lane interstate highway near Nashville, TN. The video is recorded during a period of high traffic density with 500+ objects typically visible within the scene and typical object longevities of 3-15 minutes. GPS trajectories from 270 vehicle passes through the scene are manually corrected in the video data to provide a set of ground-truth trajectories for recall-oriented tracking metrics, and object detections are provided for each camera in the scene (159 million total before cross-camera fusion). Initial benchmarking of tracking-by-detection algorithms is performed against the GPS trajectories, and a best HOTA of only 9.5% is obtained (best recall 75.9% at IOU 0.1, 47.9 average IDs per ground truth object), indicating the benchmarked trackers do not perform sufficiently well at the long temporal and spatial durations required for traffic scene understanding.

Via

Access Paper or Ask Questions

Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Sep 22, 2022

Fangyu Wu, Dequan Wang, Minjune Hwang, Chenhui Hao, Jiawei Lu, Jiamu Zhang, Christopher Chou, Trevor Darrell, Alexandre Bayen

Figure 1 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Figure 2 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Figure 3 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Figure 4 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Abstract:Decentralized multiagent planning has been an important field of research in robotics. An interesting and impactful application in the field is decentralized vehicle coordination in understructured road environments. For example, in an intersection, it is useful yet difficult to deconflict multiple vehicles of intersecting paths in absence of a central coordinator. We learn from common sense that, for a vehicle to navigate through such understructured environments, the driver must understand and conform to the implicit "social etiquette" observed by nearby drivers. To study this implicit driving protocol, we collect the Berkeley DeepDrive Drone dataset. The dataset contains 1) a set of aerial videos recording understructured driving, 2) a collection of images and annotations to train vehicle detection models, and 3) a kit of development scripts for illustrating typical usages. We believe that the dataset is of primary interest for studying decentralized multiagent planning employed by human drivers and, of secondary interest, for computer vision in remote sensing settings.

* 6 pages, 10 figures, 1 table

Via

Access Paper or Ask Questions

Multi-Adversarial Safety Analysis for Autonomous Vehicles

Dec 29, 2021

Gilbert Bahati, Marsalis Gibson, Alexandre Bayen

Figure 1 for Multi-Adversarial Safety Analysis for Autonomous Vehicles

Figure 2 for Multi-Adversarial Safety Analysis for Autonomous Vehicles

Figure 3 for Multi-Adversarial Safety Analysis for Autonomous Vehicles

Abstract:This work in progress considers reachability-based safety analysis in the domain of autonomous driving in multi-agent systems. We formulate the safety problem for a car following scenario as a differential game and study how different modelling strategies yield very different behaviors regardless of the validity of the strategies in other scenarios. Given the nature of real-life driving scenarios, we propose a modeling strategy in our formulation that accounts for subtle interactions between agents, and compare its Hamiltonian results to other baselines. Our formulation encourages reduction of conservativeness in Hamilton-Jacobi safety analysis to provide better safety guarantees during navigation.

* 2 pages. 4 figures. RSS 2020 Workshop Robust Autonomy

Via

Access Paper or Ask Questions

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Mar 02, 2021

Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu

Figure 1 for The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Figure 2 for The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Figure 3 for The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Figure 4 for The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Abstract:Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most influential to MAPPO's practical performance with ablation studies.

Via

Access Paper or Ask Questions

A Graph Convolutional Network with Signal Phasing Information for Arterial Traffic Prediction

Dec 25, 2020

Victor Chan, Qijian Gan, Alexandre Bayen

Figure 1 for A Graph Convolutional Network with Signal Phasing Information for Arterial Traffic Prediction

Figure 2 for A Graph Convolutional Network with Signal Phasing Information for Arterial Traffic Prediction

Figure 3 for A Graph Convolutional Network with Signal Phasing Information for Arterial Traffic Prediction

Figure 4 for A Graph Convolutional Network with Signal Phasing Information for Arterial Traffic Prediction

Abstract:Accurate and reliable prediction of traffic measurements plays a crucial role in the development of modern intelligent transportation systems. Due to more complex road geometries and the presence of signal control, arterial traffic prediction is a level above freeway traffic prediction. Many existing studies on arterial traffic prediction only consider temporal measurements of flow and occupancy from loop sensors and neglect the rich spatial relationships between upstream and downstream detectors. As a result, they often suffer large prediction errors, especially for long horizons. We fill this gap by enhancing a deep learning approach, Diffusion Convolutional Recurrent Neural Network, with spatial information generated from signal timing plans at targeted intersections. Traffic at signalized intersections is modeled as a diffusion process with a transition matrix constructed from the phase splits of the signal phase timing plan. We apply this novel method to predict traffic flow from loop sensor measurements and signal timing plans at an arterial intersection in Arcadia, CA. We demonstrate that our proposed method yields superior forecasts; for a prediction horizon of 30 minutes, we cut the MAPE down to 16% for morning peaks, 10% for off peaks, and even 8% for afternoon peaks. In addition, we exemplify the robustness of our model through a number of experiments with various settings in detector coverage, detector type, and data quality.

* 10 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Dec 03, 2020

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine

Figure 1 for Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Figure 2 for Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Figure 3 for Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Figure 4 for Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Abstract:A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.

Via

Access Paper or Ask Questions

Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

Oct 30, 2020

Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen

Figure 1 for Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

Figure 2 for Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

Figure 3 for Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

Figure 4 for Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

Abstract:We study the ability of autonomous vehicles to improve the throughput of a bottleneck using a fully decentralized control scheme in a mixed autonomy setting. We consider the problem of improving the throughput of a scaled model of the San Francisco-Oakland Bay Bridge: a two-stage bottleneck where four lanes reduce to two and then reduce to one. Although there is extensive work examining variants of bottleneck control in a centralized setting, there is less study of the challenging multi-agent setting where the large number of interacting AVs leads to significant optimization difficulties for reinforcement learning methods. We apply multi-agent reinforcement algorithms to this problem and demonstrate that significant improvements in bottleneck throughput, from 20\% at a 5\% penetration rate to 33\% at a 40\% penetration rate, can be achieved. We compare our results to a hand-designed feedback controller and demonstrate that our results sharply outperform the feedback controller despite extensive tuning. Additionally, we demonstrate that the RL-based controllers adopt a robust strategy that works across penetration rates whereas the feedback controllers degrade immediately upon penetration rate variation. We investigate the feasibility of both action and observation decentralization and demonstrate that effective strategies are possible using purely local sensing. Finally, we open-source our code at https://github.com/eugenevinitsky/decentralized_bottlenecks.

Via

Access Paper or Ask Questions