Abstract:Lifelong Multi-Agent Path Finding (LMAPF) is a variant of MAPF where agents are continually assigned new goals, necessitating frequent re-planning to accommodate these dynamic changes. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module and systematic single-step collision resolution and global guidance techniques. Our proposed solver, Scalable Imitation Learning for LMAPF (SILLM), inherits the fast reasoning speed of learning-based methods and the high solution quality of search-based methods with the help of modern GPUs. Across six large-scale maps with up to 10,000 agents and varying obstacle structures, SILLM surpasses the best learning- and search-based baselines, achieving average throughput improvements of 137.7% and 16.0%, respectively. Furthermore, SILLM also beats the winning solution of the 2023 League of Robot Runners, an international LMAPF competition sponsored by Amazon Robotics. Finally, we validated SILLM with 10 real robots and 100 virtual robots in a mockup warehouse environment.
Abstract:Autonomous robot exploration requires a robot to efficiently explore and map unknown environments. Compared to conventional methods that can only optimize paths based on the current robot belief, learning-based methods show the potential to achieve improved performance by drawing on past experiences to reason about unknown areas. In this paper, we propose DARE, a novel generative approach that leverages diffusion models trained on expert demonstrations, which can explicitly generate an exploration path through one-time inference. We build DARE upon an attention-based encoder and a diffusion policy model, and introduce ground truth optimal demonstrations for training to learn better patterns for exploration. The trained planner can reason about the partial belief to recognize the potential structure in unknown areas and consider these areas during path planning. Our experiments demonstrate that DARE achieves on-par performance with both conventional and learning-based state-of-the-art exploration planners, as well as good generalizability in both simulations and real-life scenarios.
Abstract:Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths and significantly improved mapped area consistency among robots when compared to state-of-the-art baselines. Our simulation training and testing code is available at https://github.com/marmotlab/IR2.
Abstract:In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily focus on single-task performance, but their lack of multi-task generalization capabilities typically results in substantial computational waste and limited real-life applicability. Meanwhile, existing offline multi-task MARL approaches are heavily dependent on data quality, often resulting in poor performance on unseen tasks. In this paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for Enhanced Multi-Task Generalization, which integrates online and offline learning to ensure both multi-task generalization and training efficiency. Specifically, our framework extracts potential general skills from offline multi-task datasets. We then train policies to select the optimal skills under the centralized training and decentralized execution paradigm (CTDE). During this stage, we utilize a replay buffer that integrates both offline data and online interactions. We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show that HyGen outperforms a wide range of existing solely online and offline methods.
Abstract:In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.
Abstract:The Multi-agent Path Finding (MAPF) problem involves finding collision-free paths for a team of agents in a known, static environment, with important applications in warehouse automation, logistics, or last-mile delivery. To meet the needs of these large-scale applications, current learning-based methods often deploy the same fully trained, decentralized network to all agents to improve scalability. However, such parameter sharing typically results in homogeneous behaviors among agents, which may prevent agents from breaking ties around symmetric conflict (e.g., bottlenecks) and might lead to live-/deadlocks. In this paper, we propose SYLPH, a novel learning-based MAPF framework aimed to mitigate the adverse effects of homogeneity by allowing agents to learn and dynamically select different social behaviors (akin to individual, dynamic roles), without affecting the scalability offered by parameter sharing. Specifically, SYLPH agents learn to select their Social Value Orientation (SVO) given the situation at hand, quantifying their own level of selfishness/altruism, as well as an SVO-conditioned MAPF policy dictating their movement actions. To these ends, each agent first determines the most influential other agent in the system by predicting future conflicts/interactions with other agents. Each agent selects its own SVO towards that agent, and trains its decentralized MAPF policy to enact this SVO until another agent becomes more influential. To further allow agents to consider each others' social preferences, each agent gets access to the SVO value of their neighbors. As a result of this hierarchical decision-making and exchange of social preferences, SYLPH endows agents with the ability to reason about the MAPF task through more latent spaces and nuanced contexts, leading to varied responses that can help break ties around symmetric conflicts. [...]
Abstract:Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and privileged reinforcement learning to achieve a significant reduction in bandwidth consumption, while minimally sacrificing exploration efficiency. Specifically, our approach allows robots to learn to embed the most salient information from their individual belief (partial map) over the environment into fixed-sized messages. Robots then reason about their own belief as well as received messages to distributedly explore the environment while avoiding redundant work. In doing so, we employ privileged learning and learned attention mechanisms to endow the critic (i.e., teacher) network with ground truth map knowledge to effectively guide the policy (i.e., student) network during training. Compared to relevant baselines, our model allows the team to reduce communication by up to two orders of magnitude, while only sacrificing a marginal 2.4\% in total travel distance, paving the way for efficient, distributed multi-robot exploration in bandwidth-limited scenarios.
Abstract:In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges, we present STAR (Swarm Technology for Aerial Robotics Research), a framework developed explicitly to improve the accessibility of aerial swarm research experiments. Our framework introduces a swarm architecture based on the Crazyflie, a low-cost, open-source, palm-sized aerial platform, well suited for experimental swarm algorithms. To augment cost-effectiveness and mitigate the limitations of employing low-cost robots in experiments, we propose a landmark-based localization module leveraging fiducial markers. This module, also serving as a target detection module, enhances the adaptability and versatility of the framework. Additionally, collision and obstacle avoidance are implemented through velocity obstacles. The presented work strives to bridge the gap between theoretical advances and tangible implementations, thus fostering progress in the field.
Abstract:Multi-Agent Path Finding (MAPF) is a critical component of logistics and warehouse management, which focuses on planning collision-free paths for a team of robots in a known environment. Recent work introduced a novel MAPF approach, LNS2, which proposed to repair a quickly-obtainable set of infeasible paths via iterative re-planning, by relying on a fast, yet lower-quality, priority-based planner. At the same time, there has been a recent push for Multi-Agent Reinforcement Learning (MARL) based MAPF algorithms, which let agents learn decentralized policies that exhibit improved cooperation over such priority planning, although inevitably remaining slower. In this paper, we introduce a new MAPF algorithm, LNS2+RL, which combines the distinct yet complementary characteristics of LNS2 and MARL to effectively balance their individual limitations and get the best from both worlds. During early iterations, LNS2+RL relies on MARL for low-level re-planning, which we show eliminates collisions much more than a priority-based planner. There, our MARL-based planner allows agents to reason about past and future/predicted information to gradually learn cooperative decision-making through a finely designed curriculum learning. At later stages of planning, LNS2+RL adaptively switches to priority-based planning to quickly resolve the remaining collisions, naturally trading-off solution quality and computational efficiency. Our comprehensive experiments on challenging tasks across various team sizes, world sizes, and map structures consistently demonstrate the superior performance of LNS2+RL compared to many MAPF algorithms, including LNS2, LaCAM, and EECBS, where LNS2+RL shows significantly better performance in complex scenarios. We finally experimentally validate our algorithm in a hybrid simulation of a warehouse mockup involving a team of 100 (real-world and simulated) robots.
Abstract:Animals possess a remarkable ability to navigate challenging terrains, achieved through the interplay of various pathways between the brain, central pattern generators (CPGs) in the spinal cord, and musculoskeletal system. Traditional bioinspired control frameworks often rely on a singular control policy that models both higher (supraspinal) and spinal cord functions. In this work, we build upon our previous research by introducing two distinct neural networks: one tasked with modulating the frequency and amplitude of CPGs to generate the basic locomotor rhythm (referred to as the spinal policy, SCP), and the other responsible for receiving environmental perception data and directly modulating the rhythmic output from the SCP to execute precise movements on challenging terrains (referred to as the descending modulation policy). This division of labor more closely mimics the hierarchical locomotor control systems observed in legged animals, thereby enhancing the robot's ability to navigate various uneven surfaces, including steps, high obstacles, and terrains with gaps. Additionally, we investigate the impact of sensorimotor delays within our framework, validating several biological assumptions about animal locomotion systems. Specifically, we demonstrate that spinal circuits play a crucial role in generating the basic locomotor rhythm, while descending pathways are essential for enabling appropriate gait modifications to accommodate uneven terrain. Notably, our findings also reveal that the multi-layered control inherent in animals exhibits remarkable robustness against time delays. Through these investigations, this paper contributes to a deeper understanding of the fundamental principles of interplay between spinal and supraspinal mechanisms in biological locomotion. It also supports the development of locomotion controllers in parallel to biological structures which are ...