Abstract:Understanding the relationships between geometric structures and semantic concepts is crucial for building accurate models of complex environments. In indoors, certain spatial constraints, such as the relative positioning of planes, remain consistent despite variations in layout. This paper explores how these invariant relationships can be captured in a graph SLAM framework by representing high-level concepts like rooms and walls, linking them to geometric elements like planes through an optimizable factor graph. Several efforts have tackled this issue with add-hoc solutions for each concept generation and with manually-defined factors. This paper proposes a novel method for metric-semantic factor graph generation which includes defining a semantic scene graph, integrating geometric information, and learning the interconnecting factors, all based on Graph Neural Networks (GNNs). An edge classification network (G-GNN) sorts the edges between planes into same room, same wall or none types. The resulting relations are clustered, generating a room or wall for each cluster. A second family of networks (F-GNN) infers the geometrical origin of the new nodes. The definition of the factors employs the same F-GNN used for the metric attribute of the generated nodes. Furthermore, share the new factor graph with the S-Graphs+ algorithm, extending its graph expressiveness and scene representation with the ultimate goal of improving the SLAM performance. The complexity of the environments is increased to N-plane rooms by training the networks on L-shaped rooms. The framework is evaluated in synthetic and simulated scenarios as no real datasets of the required complex layouts are available.
Abstract:RGB-D cameras supply rich and dense visual and spatial information for various robotics tasks such as scene understanding, map reconstruction, and localization. Integrating depth and visual information can aid robots in localization and element mapping, advancing applications like 3D scene graph generation and Visual Simultaneous Localization and Mapping (VSLAM). While point cloud data containing such information is primarily used for enhanced scene understanding, exploiting their potential to capture and represent rich semantic information has yet to be adequately targeted. This paper presents a real-time pipeline for localizing building components, including wall and ground surfaces, by integrating geometric calculations for pure 3D plane detection followed by validating their semantic category using point cloud data from RGB-D cameras. It has a parallel multi-thread architecture to precisely estimate poses and equations of all the planes detected in the environment, filters the ones forming the map structure using a panoptic segmentation validation, and keeps only the validated building components. Incorporating the proposed method into a VSLAM framework confirmed that constraining the map with the detected environment-driven semantic elements can improve scene understanding and map reconstruction accuracy. It can also ensure (re-)association of these detected components into a unified 3D scene graph, bridging the gap between geometric accuracy and semantic understanding. Additionally, the pipeline allows for the detection of potential higher-level structural entities, such as rooms, by identifying the relationships between building components based on their layout.
Abstract:Having prior knowledge of an environment boosts the localization and mapping accuracy of robots. Several approaches in the literature have utilized architectural plans in this regard. However, almost all of them overlook the deviations between actual as-built environments and as-planned architectural designs, introducing bias in the estimations. To address this issue, we present a novel localization and mapping method denoted as deviations-informed Situational Graphs or diS-Graphs that integrates prior knowledge from architectural plans even in the presence of deviations. It is based on Situational Graphs (S-Graphs) that merge geometric models of the environment with 3D scene graphs into a multi-layered jointly optimizable factor graph. Our diS-Graph extracts information from architectural plans by first modeling them as a hierarchical factor graph, which we will call an Architectural Graph (A-Graph). While the robot explores the real environment, it estimates an S-Graph from its onboard sensors. We then use a novel matching algorithm to register the A-Graph and S-Graph in the same reference, and merge both of them with an explicit model of deviations. Finally, an alternating graph optimization strategy allows simultaneous global localization and mapping, as well as deviation estimation between both the A-Graph and the S-Graph. We perform several experiments in simulated and real datasets in the presence of deviations. On average, our diS-Graphs outperforms the baselines by a margin of approximately 43% in simulated environments and by 7% in real environments, while being able to estimate deviations up to 35 cm and 15 degrees.
Abstract:This paper provides a structured and practical roadmap for practitioners to integrate Learning from Demonstration (LfD ) into manufacturing tasks, with a specific focus on industrial manipulators. Motivated by the paradigm shift from mass production to mass customization, it is crucial to have an easy-to-follow roadmap for practitioners with moderate expertise, to transform existing robotic processes to customizable LfD-based solutions. To realize this transformation, we devise the key questions of "What to Demonstrate", "How to Demonstrate", "How to Learn", and "How to Refine". To follow through these questions, our comprehensive guide offers a questionnaire-style approach, highlighting key steps from problem definition to solution refinement. The paper equips both researchers and industry professionals with actionable insights to deploy LfD-based solutions effectively. By tailoring the refinement criteria to manufacturing settings, the paper addresses related challenges and strategies for enhancing LfD performance in manufacturing contexts.
Abstract:5G New Radio Time of Arrival (ToA) data has the potential to revolutionize indoor localization for micro aerial vehicles (MAVs). However, its performance under varying network setups, especially when combined with IMU data for real-time localization, has not been fully explored so far. In this study, we develop an error state Kalman filter (ESKF) and a pose graph optimization (PGO) approach to address this gap. We systematically evaluate the performance of the derived approaches for real-time MAV localization in realistic scenarios with 5G base stations in Line-Of-Sight (LOS), demonstrating the potential of 5G technologies in this domain. In order to experimentally test and compare our localization approaches, we augment the EuRoC MAV benchmark dataset for visual-inertial odometry with simulated yet highly realistic 5G ToA measurements. Our experimental results comprehensively assess the impact of varying network setups, including varying base station numbers and network configurations, on ToA-based MAV localization performance. The findings show promising results for seamless and robust localization using 5G ToA measurements, achieving an accuracy of 15 cm throughout the entire trajectory within a graph-based framework with five 5G base stations, and an accuracy of up to 34 cm in the case of ESKF-based localization. Additionally, we measure the run time of both algorithms and show that they are both fast enough for real-time implementation.
Abstract:The aim of this study is to investigate an automated industrial manipulation pipeline, where assembly tasks can be flexibly adapted to production without the need for a robotic expert, both for the vision system and the robot program. The objective of this study is first, to develop a synthetic-dataset-generation pipeline with a special focus on industrial parts, and second, to use Learning-from-Demonstration (LfD) methods to replace manual robot programming, so that a non-robotic expert/process engineer can introduce a new manipulation task by teaching it to the robot.
Abstract:This paper introduces a novel Nussbaum function-based PID control for robotic manipulators. The integration of the Nussbaum function into the PID framework provides a solution with a simple structure that effectively tackles the challenge of unknown control directions. Stability is achieved through a combination of neural network-based estimation and Lyapunov analysis, facilitating automatic gain adjustment without the need for system dynamics. Our approach offers a gain determination with minimum parameter requirements, significantly reducing the complexity and enhancing the efficiency of robotic manipulator control. The paper guarantees that all signals within the closed-loop system remain bounded. Lastly, numerical simulations validate the theoretical framework, confirming the effectiveness of the proposed control strategy in enhancing robotic manipulator control.
Abstract:Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.
Abstract:Collaborative Simultaneous Localization and Mapping (CSLAM) is critical to enable multiple robots to operate in complex environments. Most CSLAM techniques rely on raw sensor measurement or low-level features such as keyframe descriptors, which can lead to wrong loop closures due to the lack of deep understanding of the environment. Moreover, the exchange of these measurements and low-level features among the robots requires the transmission of a significant amount of data, which limits the scalability of the system. To overcome these limitations, we present Multi S-Graphs, a decentralized CSLAM system that utilizes high-level semantic-relational information embedded in the four-layered hierarchical and optimizable situational graphs for cooperative map generation and localization while minimizing the information exchanged between the robots. To support this, we present a novel room-based descriptor which, along with its connected walls, is used to perform inter-robot loop closures, addressing the challenges of multi-robot kidnapped problem initialization. Multiple experiments in simulated and real environments validate the improvement in accuracy and robustness of the proposed approach while reducing the amount of data exchanged between robots compared to other state-of-the-art approaches. Software available within a docker image: https://github.com/snt-arg/multi_s_graphs_docker
Abstract:Recent works on SLAM extend their pose graphs with higher-level semantic concepts exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as wall surfaces and rooms, whose relationship is mathematically defined. Nevertheless, excerpting these high-level concepts relying exclusively on the lower-level factor-graph remains a challenge and it is currently done with ad-hoc algorithms, which limits its capability to include new semantic-relational concepts. To overcome this limitation, in this work, we propose a Graph Neural Network (GNN) for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. We have demonstrated that we can infer room entities and their relationship to the mapped wall surfaces, more accurately and more computationally efficient than the baseline algorithm. Additionally, to demonstrate the versatility of our method, we provide a new semantic concept, i.e. wall, and its relationship with its wall surfaces. Our proposed method has been integrated into S-Graphs+, and it has been validated in both simulated and real datasets. A docker container with our software will be made available to the scientific community.