Polytechnique Montreal
Abstract:Swarm intelligence (SI) explores how large groups of simple individuals (e.g., insects, fish, birds) collaborate to produce complex behaviors, exemplifying that the whole is greater than the sum of its parts. A fundamental task in SI is Collective Decision-Making (CDM), where a group selects the best option among several alternatives, such as choosing an optimal foraging site. In this work, we demonstrate a theoretical and empirical equivalence between CDM and single-agent reinforcement learning (RL) in multi-armed bandit problems, utilizing concepts from opinion dynamics, evolutionary game theory, and RL. This equivalence bridges the gap between SI and RL and leads us to introduce a novel abstract RL update rule called Maynard-Cross Learning. Additionally, it provides a new population-based perspective on common RL practices like learning rate adjustment and batching. Our findings enable cross-disciplinary fertilization between RL and SI, allowing techniques from one field to enhance the understanding and methodologies of the other.
Abstract:The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are usually constrained to small, homogeneous populations and remain computationally intensive. In this paper, we study how large heterogeneous populations of learning agents evolve in normal-form games. We show how, under assumptions commonly made in the multi-armed bandit literature, Multi-Agent Policy Gradient closely resembles the Replicator Dynamic, and we further derive a fast, parallelizable implementation of Opponent-Learning Awareness tailored for evolutionary simulations. This enables us to simulate the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. We demonstrate our approach in simulations of 200,000 agents, evolving in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game highlights distinct ways in which Opponent-Learning Awareness affects evolution.
Abstract:Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.
Abstract:Exploration of unknown, unstructured environments, such as in search and rescue, cave exploration, and planetary missions,presents significant challenges due to their unpredictable nature. This unpredictability can lead to inefficient path planning and potential mission failures. We propose a multi-objective risk assessment method for exploration planning in such unconstrained environments. Our approach dynamically adjusts the weight of various risk factors to prevent the robot from undertaking lethal actions too early in the mission. By gradually increasing the allowable risk as the mission progresses, our method enables more efficient exploration. We evaluate risk based on environmental terrain properties, including elevation, slope, roughness, and traversability, and account for factors like battery life, mission duration, and travel distance. Our method is validated through experiments in various subterranean simulated cave environments. The results demonstrate that our approach ensures consistent exploration without incurring lethal actions, while introducing minimal computational overhead to the planning process.
Abstract:Three-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.
Abstract:Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborative robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.
Abstract:Exploration in unknown and unstructured environments is a pivotal requirement for robotic applications. A robot's exploration behavior can be inherently affected by the performance of its Simultaneous Localization and Mapping (SLAM) subsystem, although SLAM and exploration are generally studied separately. In this paper, we formulate exploration as an active mapping problem and extend it with semantic information. We introduce a novel active metric-semantic SLAM approach, leveraging recent research advances in information theory and spectral graph theory: we combine semantic mutual information and the connectivity metrics of the underlying pose graph of the SLAM subsystem. We use the resulting utility function to evaluate different trajectories to select the most favorable strategy during exploration. Exploration and SLAM metrics are analyzed in experiments. Running our algorithm on the Habitat dataset, we show that, while maintaining efficiency close to the state-of-the-art exploration methods, our approach effectively increases the performance of metric-semantic SLAM with a 21% reduction in average map error and a 9% improvement in average semantic classification accuracy.
Abstract:In dynamic environments, the ability to detect and track moving objects in real-time is crucial for autonomous robots to navigate safely and effectively. Traditional methods for dynamic object detection rely on high accuracy odometry and maps to detect and track moving objects. However, these methods are not suitable for long-term operation in dynamic environments where the surrounding environment is constantly changing. In order to solve this problem, we propose a novel system for detecting and tracking dynamic objects in real-time using only LiDAR data. By emphasizing the extraction of low-frequency components from LiDAR data as feature points for foreground objects, our method significantly reduces the time required for object clustering and movement analysis. Additionally, we have developed a tracking approach that employs intensity-based ego-motion estimation along with a sliding window technique to assess object movements. This enables the precise identification of moving objects and enhances the system's resilience to odometry drift. Our experiments show that this system can detect and track dynamic objects in real-time with an average detection accuracy of 88.7\% and a recall rate of 89.1\%. Furthermore, our system demonstrates resilience against the prolonged drift typically associated with front-end only LiDAR odometry. All of the source code, labeled dataset, and the annotation tool are available at: https://github.com/MISTLab/lidar_dynamic_objects_detection.git
Abstract:Traditional reinforcement learning (RL) generates discrete control policies, assigning one action per cycle. These policies are usually implemented as in a fixed-frequency control loop. This rigidity presents challenges as optimal control frequency is task-dependent; suboptimal frequencies increase computational demands and reduce exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues with adaptive control frequencies, executing actions only when necessary, thus reducing computational load and extending the action space to include action durations. In this paper we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method to perform VTS-RL, validating it through theoretical analysis and experimentation in simulation and on real robots. Results show faster convergence, better training results, and reduced energy consumption with respect to other variable- or fixed-frequency approaches.
Abstract:Traditional reinforcement learning (RL) methods typically employ a fixed control loop, where each cycle corresponds to an action. This rigidity poses challenges in practical applications, as the optimal control frequency is task-dependent. A suboptimal choice can lead to high computational demands and reduced exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues by using adaptive frequencies for the control loop, executing actions only when necessary. This approach, rooted in reactive programming principles, reduces computational load and extends the action space by including action durations. However, VTS-RL's implementation is often complicated by the need to tune multiple hyperparameters that govern exploration in the multi-objective action-duration space (i.e., balancing task performance and number of time steps to achieve a goal). To overcome these challenges, we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method. This method features an adaptive reward scheme that adjusts hyperparameters based on observed trends in task rewards during training. This scheme reduces the complexity of hyperparameter tuning, requiring a single hyperparameter to guide exploration, thereby simplifying the learning process and lowering deployment costs. We validate the MOSEAC method through simulations in a Newtonian kinematics environment, demonstrating high task and training performance with fewer time steps, ultimately lowering energy consumption. This validation shows that MOSEAC streamlines RL algorithm deployment by automatically tuning the agent control loop frequency using a single parameter. Its principles can be applied to enhance any RL algorithm, making it a versatile solution for various applications.