Abstract:Accurate, efficient, and robust state estimation is more important than ever in robotics as the variety of platforms and complexity of tasks continue to grow. Historically, discrete-time filters and smoothers have been the dominant approach, in which the estimated variables are states at discrete sample times. The paradigm of continuous-time state estimation proposes an alternative strategy by estimating variables that express the state as a continuous function of time, which can be evaluated at any query time. Not only can this benefit downstream tasks such as planning and control, but it also significantly increases estimator performance and flexibility, as well as reduces sensor preprocessing and interfacing complexity. Despite this, continuous-time methods remain underutilized, potentially because they are less well-known within robotics. To remedy this, this work presents a unifying formulation of these methods and the most exhaustive literature review to date, systematically categorizing prior work by methodology, application, state variables, historical context, and theoretical contribution to the field. By surveying splines and Gaussian processes together and contextualizing works from other research domains, this work identifies and analyzes open problems in continuous-time state estimation and suggests new research directions.
Abstract:The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.
Abstract:Construction sites are challenging environments for autonomous systems due to their unstructured nature and the presence of dynamic actors, such as workers and machinery. This work presents a comprehensive panoptic scene understanding solution designed to handle the complexities of such environments by integrating 2D panoptic segmentation with 3D LiDAR mapping. Our system generates detailed environmental representations in real-time by combining semantic and geometric data, supported by Kalman Filter-based tracking for dynamic object detection. We introduce a fine-tuning method that adapts large pre-trained panoptic segmentation models for construction site applications using a limited number of domain-specific samples. For this use case, we release a first-of-its-kind dataset of 502 hand-labeled sample images with panoptic annotations from construction sites. In addition, we propose a dynamic panoptic mapping technique that enhances scene understanding in unstructured environments. As a case study, we demonstrate the system's application for autonomous navigation, utilizing real-time RRT* for reactive path planning in dynamic scenarios. The dataset (https://leggedrobotics.github.io/panoptic-scene-understanding.github.io/) and code (https://github.com/leggedrobotics/rsl_panoptic_mapping) for training and deployment are publicly available to support future research.
Abstract:The ICP registration algorithm has been a preferred method for LiDAR-based robot localization for nearly a decade. However, even in modern SLAM solutions, ICP can degrade and become unreliable in geometrically ill-conditioned environments. Current solutions primarily focus on utilizing additional sources of information, such as external odometry, to either replace the degenerate directions of the optimization solution or add additional constraints in a sensor-fusion setup afterward. In response, this work investigates and compares new and existing degeneracy mitigation methods for robust LiDAR-based localization and analyzes the efficacy of these approaches in degenerate environments for the first time in the literature at this scale. Specifically, this work proposes and investigates i) the incorporation of different types of constraints into the ICP algorithm, ii) the effect of using active or passive degeneracy mitigation techniques, and iii) the choice of utilizing global point cloud registration methods on the ill-conditioned ICP problem in LiDAR degenerate environments. The study results are validated through multiple real-world field and simulated experiments. The analysis shows that active optimization degeneracy mitigation is necessary and advantageous in the absence of reliable external estimate assistance for LiDAR-SLAM. Furthermore, introducing degeneracy-aware hard constraints in the optimization before or during the optimization is shown to perform better in the wild than by including the constraints after. Moreover, with heuristic fine-tuned parameters, soft constraints can provide equal or better results in complex ill-conditioned scenarios. The implementations used in the analysis of this work are made publicly available to the community.
Abstract:Reconstructing the 3D shape of a deformable environment from the information captured by a moving depth camera is highly relevant to surgery. The underlying challenge is the fact that simultaneously estimating camera motion and tissue deformation in a fully deformable scene is an ill-posed problem, especially from a single arbitrarily moving viewpoint. Current solutions are often organ-specific and lack the robustness required to handle large deformations. Here we propose a multi-viewpoint global optimization framework that can flexibly integrate the output of low-level perception modules (data association, depth, and relative scene flow) with kinematic and scene-modeling priors to jointly estimate multiple camera motions and absolute scene flow. We use simulated noisy data to show three practical examples that successfully constrain the convergence to a unique solution. Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds. MultiViPerFrOG builds a generalized learning-free scaffolding for spatio-temporal encoding that can unlock advanced surgical scene representations and will facilitate the development of the computer-assisted-surgery technologies of the future.
Abstract:Autonomous navigation at high speeds in off-road environments necessitates robots to comprehensively understand their surroundings using onboard sensing only. The extreme conditions posed by the off-road setting can cause degraded camera image quality due to poor lighting and motion blur, as well as limited sparse geometric information available from LiDAR sensing when driving at high speeds. In this work, we present RoadRunner, a novel framework capable of predicting terrain traversability and an elevation map directly from camera and LiDAR sensor inputs. RoadRunner enables reliable autonomous navigation, by fusing sensory information, handling of uncertainty, and generation of contextually informed predictions about the geometry and traversability of the terrain while operating at low latency. In contrast to existing methods relying on classifying handcrafted semantic classes and using heuristics to predict traversability costs, our method is trained end-to-end in a self-supervised fashion. The RoadRunner network architecture builds upon popular sensor fusion network architectures from the autonomous driving domain, which embed LiDAR and camera information into a common Bird's Eye View perspective. Training is enabled by utilizing an existing traversability estimation stack to generate training data in hindsight in a scalable manner from real-world off-road driving datasets. Furthermore, RoadRunner improves the system latency by a factor of roughly 4, from 500 ms to 140 ms, while improving the accuracy for traversability costs and elevation map predictions. We demonstrate the effectiveness of RoadRunner in enabling safe and reliable off-road navigation at high speeds in multiple real-world driving scenarios through unstructured desert environments.
Abstract:New sensing technologies and more advanced processing algorithms are transforming computer-integrated surgery. While researchers are actively investigating depth sensing and 3D reconstruction for vision-based surgical assistance, it remains difficult to achieve real-time, accurate, and robust 3D representations of the abdominal cavity for minimally invasive surgery. Thus, this work uses quantitative testing on fresh ex-vivo porcine tissue to thoroughly characterize the quality with which a 3D laser-based time-of-flight sensor (lidar) can perform anatomical surface reconstruction. Ground-truth surface shapes are captured with a commercial laser scanner, and the resulting signed error fields are analyzed using rigorous statistical tools. When compared to modern learning-based stereo matching from endoscopic images, time-of-flight sensing demonstrates higher precision, lower processing delay, higher frame rate, and superior robustness against sensor distance and poor illumination. Furthermore, we report on the potential negative effect of near-infrared light penetration on the accuracy of lidar measurements across different tissue samples, identifying a significant measured depth offset for muscle in contrast to fat and liver. Our findings highlight the potential of lidar for intraoperative 3D perception and point toward new methods that combine complementary time-of-flight and spectral imaging.
Abstract:Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.
Abstract:Modern robotic systems are required to operate in challenging environments, which demand reliable localization under challenging conditions. LiDAR-based localization methods, such as the Iterative Closest Point (ICP) algorithm, can suffer in geometrically uninformative environments that are known to deteriorate registration performance and push optimization toward divergence along weakly constrained directions. To overcome this issue, this work proposes i) a robust multi-category (non-)localizability detection module, and ii) a localizability-aware constrained ICP optimization module and couples both in a unified manner. The proposed localizability detection is achieved by utilizing the correspondences between the scan and the map to analyze the alignment strength against the principal directions of the optimization as part of its multi-category LiDAR localizability analysis. In the second part, this localizability analysis is then tightly integrated into the scan-to-map point cloud registration to generate drift-free pose updates along well-constrained directions. The proposed method is thoroughly evaluated and compared to state-of-the-art methods in simulation and during real-world experiments, underlying the gain in performance and reliability in LiDAR-challenging scenarios. In all experiments, the proposed framework demonstrates accurate and generalizable localizability detection and robust pose estimation without environment-specific parameter tuning.
Abstract:LiDAR-based localization and mapping is one of the core components in many modern robotic systems due to the direct integration of range and geometry, allowing for precise motion estimation and generation of high quality maps in real-time. Yet, as a consequence of insufficient environmental constraints present in the scene, this dependence on geometry can result in localization failure, happening in self-symmetric surroundings such as tunnels. This work addresses precisely this issue by proposing a neural network-based estimation approach for detecting (non-)localizability during robot operation. Special attention is given to the localizability of scan-to-scan registration, as it is a crucial component in many LiDAR odometry estimation pipelines. In contrast to previous, mostly traditional detection approaches, the proposed method enables early detection of failure by estimating the localizability on raw sensor measurements without evaluating the underlying registration optimization. Moreover, previous approaches remain limited in their ability to generalize across environments and sensor types, as heuristic-tuning of degeneracy detection thresholds is required. The proposed approach avoids this problem by learning from a corpus of different environments, allowing the network to function over various scenarios. Furthermore, the network is trained exclusively on simulated data, avoiding arduous data collection in challenging and degenerate, often hard-to-access, environments. The presented method is tested during field experiments conducted across challenging environments and on two different sensor types without any modifications. The observed detection performance is on par with state-of-the-art methods after environment-specific threshold tuning.