Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bilal Kartal

PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

Sep 23, 2021

Alexey Kamenev, Lirui Wang, Ollin Boer Bohan, Ishwar Kulkarni, Bilal Kartal, Artem Molchanov, Stan Birchfield, David Nistér, Nikolai Smolyanskiy

Figure 1 for PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

Figure 2 for PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

Figure 3 for PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

Figure 4 for PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

Abstract:Predicting the future motion of traffic agents is crucial for safe and efficient autonomous driving. To this end, we present PredictionNet, a deep neural network (DNN) that predicts the motion of all surrounding traffic agents together with the ego-vehicle's motion. All predictions are probabilistic and are represented in a simple top-down rasterization that allows an arbitrary number of agents. Conditioned on a multilayer map with lane information, the network outputs future positions, velocities, and backtrace vectors jointly for all agents including the ego-vehicle in a single pass. Trajectories are then extracted from the output. The network can be used to simulate realistic traffic, and it produces competitive results on popular benchmarks. More importantly, it has been used to successfully control a real-world vehicle for hundreds of kilometers, by combining it with a motion planning/control subsystem. The network runs faster than real-time on an embedded GPU, and the system shows good generalization (across sensory modalities and locations) due to the choice of input representation. Furthermore, we demonstrate that by extending the DNN with reinforcement learning (RL), it can better handle rare or unsafe events like aggressive maneuvers and crashes.

* 6 pages, 8 figures, submission to ICRA 2022 conference, for associated video file, see https://youtu.be/C7Nb3DRjFP0

Via

Access Paper or Ask Questions

Work in Progress: Temporally Extended Auxiliary Tasks

Apr 16, 2020

Craig Sherstan, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Figure 1 for Work in Progress: Temporally Extended Auxiliary Tasks

Figure 2 for Work in Progress: Temporally Extended Auxiliary Tasks

Figure 3 for Work in Progress: Temporally Extended Auxiliary Tasks

Figure 4 for Work in Progress: Temporally Extended Auxiliary Tasks

Abstract:Predictive auxiliary tasks have been shown to improve performance in numerous reinforcement learning works, however, this effect is still not well understood. The primary purpose of the work presented here is to investigate the impact that an auxiliary task's prediction timescale has on the agent's policy performance. We consider auxiliary tasks which learn to make on-policy predictions using temporal difference learning. We test the impact of prediction timescale using a specific form of auxiliary task in which the input image is used as the prediction target, which we refer to as temporal difference autoencoders (TD-AE). We empirically evaluate the effect of TD-AE on the A2C algorithm in the VizDoom environment using different prediction timescales. While we do not observe a clear relationship between the prediction timescale on performance, we make the following observations: 1) using auxiliary tasks allows us to reduce the trajectory length of the A2C algorithm, 2) in some cases temporally extended TD-AE performs better than a straight autoencoder, 3) performance with auxiliary tasks is sensitive to the weight placed on the auxiliary loss, 4) despite this sensitivity, auxiliary tasks improved performance without extensive hyper-parameter tuning. Our overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance.

* Accepted for the Adaptive and Learning Agents (ALA) Workshop at AAMAS 2020

Via

Access Paper or Ask Questions

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Jul 26, 2019

Chao Gao, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Figure 1 for On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Figure 2 for On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Figure 3 for On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Figure 4 for On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Abstract:How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.

* AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2019

Via

Access Paper or Ask Questions

Action Guidance with MCTS for Deep Reinforcement Learning

Jul 25, 2019

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Figure 1 for Action Guidance with MCTS for Deep Reinforcement Learning

Figure 2 for Action Guidance with MCTS for Deep Reinforcement Learning

Figure 3 for Action Guidance with MCTS for Deep Reinforcement Learning

Figure 4 for Action Guidance with MCTS for Deep Reinforcement Learning

Abstract:Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

* AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with arXiv:1904.05759, arXiv:1812.00045

Via

Access Paper or Ask Questions

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Jul 24, 2019

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Figure 1 for Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Figure 2 for Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Figure 3 for Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Figure 4 for Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Abstract:Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.

* AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19). arXiv admin note: text overlap with arXiv:1812.00045

Via

Access Paper or Ask Questions

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Jul 22, 2019

Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

Figure 1 for Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Figure 2 for Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Figure 3 for Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Figure 4 for Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Abstract:In this paper we explore how actor-critic methods in deep reinforcement learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be extended with agent modeling. Inspired by recent works on representation learning and multiagent deep reinforcement learning, we propose two architectures to perform agent modeling: the first one based on parameter sharing, and the second one based on agent policy features. Both architectures aim to learn other agents' policies as auxiliary tasks, besides the standard actor (policy) and critic (values). We performed experiments in both cooperative and competitive domains. The former is a problem of coordinated multiagent object transportation and the latter is a two-player mini version of the Pommerman game. Our results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards.

* AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19)

Via

Access Paper or Ask Questions

Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

Apr 20, 2019

Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

Figure 1 for Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

Figure 2 for Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

Figure 3 for Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

Figure 4 for Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

Abstract:The Pommerman Team Environment is a recently proposed benchmark which involves a multi-agent domain with challenges such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards. The inaugural Pommerman Team Competition held at NeurIPS 2018 hosted 25 participants who submitted a team of 2 agents. Our submission nn_team_skynet955_skynet955 won 2nd place of the "learning agents'' category. Our team is composed of 2 neural networks trained with state of the art deep reinforcement learning algorithms and makes use of concepts like reward shaping, curriculum learning, and an automatic reasoning module for action pruning. Here, we describe these elements and additionally we present a collection of open-sourced agents that can be used for training and testing in the Pommerman environment. Code available at: https://github.com/BorealisAI/pommerman-baseline

* 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making

Via

Access Paper or Ask Questions

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Apr 10, 2019

Bilal Kartal, Pablo Hernandez-Leal, Chao Gao, Matthew E. Taylor

Figure 1 for Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Figure 2 for Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Figure 3 for Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Figure 4 for Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Abstract:Safe reinforcement learning has many variants and it is still an open research problem. Here, we focus on how to use action guidance by means of a non-expert demonstrator to avoid catastrophic events in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for reinforcement learning (RL) --- past work has shown that model-free RL algorithms fail to achieve significant learning. In this paper, we shed light into the reasons behind this failure by exemplifying and analyzing the high rate of catastrophic events (i.e., suicides) that happen under random exploration in this domain. While model-free random exploration is typically futile, we propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with small number of rollouts, can be integrated to asynchronous distributed deep reinforcement learning methods. Compared to vanilla deep RL algorithms, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

* Adaptive Learning Agents (ALA) Workshop at AAMAS 2019. arXiv admin note: substantial text overlap with arXiv:1812.00045

Via

Access Paper or Ask Questions

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Nov 30, 2018

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Figure 1 for Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Figure 2 for Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Figure 3 for Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Figure 4 for Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Abstract:Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

* 9 pages, 6 figures, To appear at AAAI-19 Workshop on Reinforcement Learning in Games

Via

Access Paper or Ask Questions

Is multiagent deep reinforcement learning the answer or the question? A brief survey

Oct 12, 2018

Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

Figure 1 for Is multiagent deep reinforcement learning the answer or the question? A brief survey

Figure 2 for Is multiagent deep reinforcement learning the answer or the question? A brief survey

Figure 3 for Is multiagent deep reinforcement learning the answer or the question? A brief survey

Figure 4 for Is multiagent deep reinforcement learning the answer or the question? A brief survey

Abstract:Deep reinforcement learning (DRL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. In this context, first, this article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Second, it provides guidelines to complement this emerging area by (i) showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and (ii) providing general lessons learned from these works. We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists in both areas (DRL and MAL) in a joint effort to promote fruitful research in the multiagent community.

* Under review since Oct 2018

Via

Access Paper or Ask Questions