Abstract:Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been enabling such advances. This article focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environments. Although there is extensive research on proprioception-based RL policies, the article highlights less standardized and reproducible approaches to exteroception. At the same time, the article aims to provide a base framework for end-to-end local navigation policies and how a custom robot can be trained in such simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2, navigation stack in Robot Operating System (ROS). We also cover the sim-to-real transfer process by demonstrating zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, which show the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces or task definitions for more complex missions. Overall, with this article we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS 2 for real-time inference in real robots. The code is available at https://github.com/sahars93/RL-Navigation.
Abstract:Federated learning (FL) has become one of the key methods for privacy-preserving collaborative learning, as it enables the transfer of models without requiring local data exchange. Within the FL framework, an aggregation algorithm is recognized as one of the most crucial components for ensuring the efficacy and security of the system. Existing average aggregation algorithms typically assume that all client-trained data holds equal value or that weights are based solely on the quantity of data contributed by each client. In contrast, alternative approaches involve training the model locally after aggregation to enhance adaptability. However, these approaches fundamentally ignore the inherent heterogeneity between different clients' data and the complexity of variations in data at the aggregation stage, which may lead to a suboptimal global model. To address these issues, this study proposes a novel dual-criterion weighted aggregation algorithm involving the quantity and quality of data from the client node. Specifically, we quantify the data used for training and perform multiple rounds of local model inference accuracy evaluation on a specialized dataset to assess the data quality of each client. These two factors are utilized as weights within the aggregation process, applied through a dynamically weighted summation of these two factors. This approach allows the algorithm to adaptively adjust the weights, ensuring that every client can contribute to the global model, regardless of their data's size or initial quality. Our experiments show that the proposed algorithm outperforms several existing state-of-the-art aggregation approaches on both a general-purpose open-source dataset, CIFAR-10, and a dataset specific to visual obstacle avoidance.
Abstract:Event cameras, inspired by biological vision, are asynchronous sensors that detect changes in brightness, offering notable advantages in environments characterized by high-speed motion, low lighting, or wide dynamic range. These distinctive properties render event cameras particularly effective for sensor fusion in robotics and computer vision, especially in enhancing traditional visual or LiDAR-inertial odometry. Conventional frame-based cameras suffer from limitations such as motion blur and drift, which can be mitigated by the continuous, low-latency data provided by event cameras. Similarly, LiDAR-based odometry encounters challenges related to the loss of geometric information in environments such as corridors. To address these limitations, unlike the existing event camera-related surveys, this paper presents a comprehensive overview of recent advancements in event-based sensor fusion for odometry applications particularly, investigating fusion strategies that incorporate frame-based cameras, inertial measurement units (IMUs), and LiDAR. The survey critically assesses the contributions of these fusion methods to improving odometry performance in complex environments, while highlighting key applications, and discussing the strengths, limitations, and unresolved challenges. Additionally, it offers insights into potential future research directions to advance event-based sensor fusion for next-generation odometry applications.
Abstract:In recent years, Light Detection and Ranging (LiDAR) technology, a critical sensor in robotics and autonomous systems, has seen significant advancements. These improvements include enhanced resolution of point clouds and the capability to provide 360{\deg} low-resolution images. These images encode various data such as depth, reflectivity, and near-infrared light within the pixels. However, an excessive density of points and conventional point cloud sampling can be counterproductive, particularly in applications such as LiDAR odometry, where misleading points and degraded geometry information may induce drift errors. Currently, extensive research efforts are being directed towards leveraging LiDAR-generated images to improve situational awareness. This paper presents a comprehensive review of current deep learning (DL) techniques, including colorization and super-resolution, which are traditionally utilized in conventional computer vision tasks. These techniques are applied to LiDAR-generated images and are analyzed qualitatively. Based on this analysis, we have developed a novel approach that selectively integrates the most suited colorization and super-resolution methods with LiDAR imagery to sample reliable points from the LiDAR point cloud. This approach aims to not only improve the accuracy of point cloud registration but also avoid mismatching caused by lacking geometry information, thereby augmenting the utility and precision of LiDAR systems in practical applications. In our evaluation, the proposed approach demonstrates superior performance compared to our previous work, achieving lower translation and rotation errors with a reduced number of points.
Abstract:Human pose estimation involves detecting and tracking the positions of various body parts using input data from sources such as images, videos, or motion and inertial sensors. This paper presents a novel approach to human pose estimation using machine learning algorithms to predict human posture and translate them into robot motion commands using ultra-wideband (UWB) nodes, as an alternative to motion sensors. The study utilizes five UWB sensors implemented on the human body to enable the classification of still poses and more robust posture recognition. This approach ensures effective posture recognition across a variety of subjects. These range measurements serve as input features for posture prediction models, which are implemented and compared for accuracy. For this purpose, machine learning algorithms including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and deep Multi-Layer Perceptron (MLP) neural network are employed and compared in predicting corresponding postures. We demonstrate the proposed approach for real-time control of different mobile/aerial robots with inference implemented in a ROS 2 node. Experimental results demonstrate the efficacy of the approach, showcasing successful prediction of human posture and corresponding robot movements with high accuracy.
Abstract:The increased data transmission and number of devices involved in communications among distributed systems make it challenging yet significantly necessary to have an efficient and reliable networking middleware. In robotics and autonomous systems, the wide application of ROS\,2 brings the possibility of utilizing various networking middlewares together with DDS in ROS\,2 for better communication among edge devices or between edge devices and the cloud. However, there is a lack of comprehensive communication performance comparison of integrating these networking middlewares with ROS\,2. In this study, we provide a quantitative analysis for the communication performance of utilized networking middlewares including MQTT and Zenoh alongside DDS in ROS\,2 among a multiple host system. For a complete and reliable comparison, we calculate the latency and throughput of these middlewares by sending distinct amounts and types of data through different network setups including Ethernet, Wi-Fi, and 4G. To further extend the evaluation to real-world application scenarios, we assess the drift error (the position changes) over time caused by these networking middlewares with the robot moving in an identical square-shaped path. Our results show that CycloneDDS performs better under Ethernet while Zenoh performs better under Wi-Fi and 4G. In the actual robot test, the robot moving trajectory drift error over time (96\,s) via Zenoh is the smallest. It is worth noting we have a discussion of the CPU utilization of these networking middlewares and the performance impact caused by enabling the security feature in ROS\,2 at the end of the paper.
Abstract:Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.e. reflectivity and ranges images, has not been assessed. These images have gained attention due to their resilience in adverse conditions such as rain or fog. Additionally, they contain significant textural information that supplements the geometric information provided by LiDAR point clouds in the point cloud registration phase, especially when reliant solely on LiDAR sensors. This addresses the challenge of drift encountered in LiDAR Odometry (LO) within geometrically identical scenarios or where not all the raw point cloud is informative and may even be misleading. This paper aims to analyze the applicability of conventional image key point extractors and descriptors on LiDAR-generated images via a comprehensive quantitative investigation. Moreover, we propose a novel approach to enhance the robustness and reliability of LO. After extracting key points, we proceed to downsample the point cloud, subsequently integrating it into the point cloud registration phase for the purpose of odometry estimation. Our experiment demonstrates that the proposed approach has comparable accuracy but reduced computational overhead, higher odometry publishing rate, and even superior performance in scenarios prone to drift by using the raw point cloud. This, in turn, lays a foundation for subsequent investigations into the integration of LiDAR-generated images with LO. Our code is available on GitHub: https://github.com/TIERS/ws-lidar-as-camera-odom.
Abstract:As multi-robot systems continue to advance and become integral to various applications, managing conflicts and ensuring secure access control are critical challenges that need to be addressed. Access control is essential in multi-robot systems to ensure secure and authorized interactions among robots, protect sensitive data, and prevent unauthorized access to resources. This paper presents a novel framework for customizable conflict resolution and attribute-based access control in multi-robot systems for ROS 2 leveraging the Hyperledger Fabric blockchain. We introduce an attribute-based access control (ABAC) Fabric-ROS 2 bridge to enable secure communication and control between users and robots. By defining conflict resolution policies based on task priorities, robot capabilities, and user-defined constraints, our framework offers a flexible way to resolve conflicts. Additionally, it incorporates attribute-based access control, granting access rights based on user and robot attributes. ABAC offers a modular approach to control access compared to existing access control approaches in ROS 2, such as SROS2. Through this framework, multi-robot systems can be managed efficiently, securely, and adaptably, ensuring controlled access to resources and managing conflicts. Our experimental evaluation shows that our framework marginally improves latency and throughput over exiting Fabric and ROS 2 integration solutions. At higher network load, it is the only solution to operate reliably without a diverging transaction commitment latency. We also demonstrate how conflicts arising from simultaneous control or a robot by two users are resolved in real-time and motion distortion is effectively eliminated.
Abstract:The remarkable growth of unmanned aerial vehicles (UAVs) has also sparked concerns about safety measures during their missions. To advance towards safer autonomous aerial robots, this work presents a vision-based solution to ensuring safe autonomous UAV landings with minimal infrastructure. During docking maneuvers, UAVs pose a hazard to people in the vicinity. In this paper, we propose the use of a single omnidirectional panoramic camera pointing upwards from a landing pad to detect and estimate the position of people around the landing area. The images are processed in real-time in an embedded computer, which communicates with the onboard computer of approaching UAVs to transition between landing, hovering or emergency landing states. While landing, the ground camera also aids in finding an optimal position, which can be required in case of low-battery or when hovering is no longer possible. We use a YOLOv7-based object detection model and a XGBooxt model for localizing nearby people, and the open-source ROS and PX4 frameworks for communication, interfacing, and control of the UAV. We present both simulation and real-world indoor experimental results to show the efficiency of our methods.
Abstract:Ultra-wideband (UWB) positioning has emerged as a low-cost and dependable localization solution for multiple use cases, from mobile robots to asset tracking within the Industrial IoT. The technology is mature and the scientific literature contains multiple datasets and methods for localization based on fixed UWB nodes. At the same time, research in UWB-based relative localization and infrastructure-free localization is gaining traction, further domains. tools and datasets in this domain are scarce. Therefore, we introduce in this paper a novel dataset for benchmarking infrastructure-free relative localization targeting the domain of multi-robot systems. Compared to previous datasets, we analyze the performance of different relative localization approaches for a much wider variety of scenarios with varying numbers of fixed and mobile nodes. A motion capture system provides ground truth data, are multi-modal and include inertial or odometry measurements for benchmarking sensor fusion methods. Additionally, the dataset contains measurements of ranging accuracy based on the relative orientation of antennas and a comprehensive set of measurements for ranging between a single pair of nodes. Our experimental analysis shows that high accuracy can be localization, but the variability of the ranging error is significant across different settings and setups.