Abstract:Reinforcement learning (RL) provides a compelling framework for enabling autonomous vehicles to continue to learn and improve diverse driving behaviors on their own. However, training real-world autonomous vehicles with current RL algorithms presents several challenges. One critical challenge, often overlooked in these algorithms, is the need to reset a driving environment between every episode. While resetting an environment after each episode is trivial in simulated settings, it demands significant human intervention in the real world. In this paper, we introduce a novel autonomous algorithm that allows off-the-shelf RL algorithms to train an autonomous vehicle with minimal human intervention. Our algorithm takes into account the learning progress of the autonomous vehicle to determine when to abort episodes before it enters unsafe states and where to reset it for subsequent episodes in order to gather informative transitions. The learning progress is estimated based on the novelty of both current and future states. We also take advantage of rule-based autonomous driving algorithms to safely reset an autonomous vehicle to an initial state. We evaluate our algorithm against baselines on diverse urban driving tasks. The experimental results show that our algorithm is task-agnostic and achieves better driving performance with fewer manual resets than baselines.
Abstract:Autonomous vehicles have been actively investigated over the past few decades. Several recent works show the potential of autonomous driving transportation services in urban environments with impressive experimental results. However, these works note that autonomous vehicles are still occasionally inferior to expert drivers in complex scenarios. Furthermore, they do not focus on the possibilities of autonomous driving transportation services in other areas beyond urban environments. This paper presents the research results and lessons learned from autonomous driving transportation services in airfield, crowded indoor, and urban environments. We discuss how we address several unique challenges in these diverse environments. We also offer an overview of remaining challenges that have not received much attention but must be addressed. This paper aims to share our unique experience to support researchers who are interested in realizing the potential of autonomous vehicles in various real-world environments.
Abstract:Hierarchical reinforcement learning (HRL) has led to remarkable achievements in diverse fields. However, existing HRL algorithms still cannot be applied to real-world navigation tasks. These tasks require an agent to perform safety-aware behaviors and interact with surrounding objects in dynamic environments. In addition, an agent in these tasks should perform consistent and structured exploration as they are long-horizon and have complex structures with diverse objects and task-specific rules. Designing HRL agents that can handle these challenges in real-world navigation tasks is an open problem. In this paper, we propose imagination-augmented HRL (IAHRL), a new and general navigation algorithm that allows an agent to learn safe and interactive behaviors in real-world navigation tasks. Our key idea is to train a hierarchical agent in which a high-level policy infers interactions by interpreting behaviors imagined with low-level policies. Specifically, the high-level policy is designed with a permutation-invariant attention mechanism to determine which low-level policy generates the most interactive behavior, and the low-level policies are implemented with an optimization-based behavior planner to generate safe and structured behaviors following task-specific rules. To evaluate our algorithm, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that our hierarchical agent performs safety-aware behaviors and properly interacts with surrounding vehicles, achieving higher success rates and lower average episode steps than baselines in urban driving tasks.
Abstract:A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.
Abstract:There are several unresolved challenges for autonomous vehicles. One of them is safely navigating among occluded pedestrians and vehicles. Much of the previous work tried to solve this problem by generating phantom cars and assessing their risk. In this paper, motivated by the previous works, we propose an algorithm that efficiently assesses risks of phantom pedestrians/vehicles using Simplified Reachability Quantification. We utilized this occlusion risk to set a speed limit at the risky position when planning the velocity profile of an autonomous vehicle. This allows an autonomous vehicle to safely and efficiently drive in occluded areas. The proposed algorithm was evaluated in various scenarios in the CARLA simulator and it reduced the average collision rate by 6.14X, the discomfort score by 5.03X, while traversal time was increased by 1.48X compared to baseline 1, and computation time was reduced by 20.15X compared to baseline 2.