Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian Nubert

Diffusion-Based Approximate MPC: Fast and Consistent Imitation of Multi-Modal Action Distributions

Apr 06, 2025

Pau Marquez Julbe, Julian Nubert, Henrik Hose, Sebastian Trimpe, Katherine J. Kuchenbecker

Abstract:Approximating model predictive control (MPC) using imitation learning (IL) allows for fast control without solving expensive optimization problems online. However, methods that use neural networks in a simple L2-regression setup fail to approximate multi-modal (set-valued) solution distributions caused by local optima found by the numerical solver or non-convex constraints, such as obstacles, significantly limiting the applicability of approximate MPC in practice. We solve this issue by using diffusion models to accurately represent the complete solution distribution (i.e., all modes) at high control rates (more than 1000 Hz). This work shows that diffusion based AMPC significantly outperforms L2-regression-based approximate MPC for multi-modal action distributions. In contrast to most earlier work on IL, we also focus on running the diffusion-based controller at a higher rate and in joint space instead of end-effector space. Additionally, we propose the use of gradient guidance during the denoising process to consistently pick the same mode in closed loop to prevent switching between solutions. We propose using the cost and constraint satisfaction of the original MPC problem during parallel sampling of solutions from the diffusion model to pick a better mode online. We evaluate our method on the fast and accurate control of a 7-DoF robot manipulator both in simulation and on hardware deployed at 250 Hz, achieving a speedup of more than 70 times compared to solving the MPC problem online and also outperforming the numerical optimization (used for training) in success ratio.

Via

Access Paper or Ask Questions

Continuous-Time State Estimation Methods in Robotics: A Survey

Nov 06, 2024

William Talbot, Julian Nubert, Turcan Tuna, Cesar Cadena, Frederike Dümbgen, Jesus Tordesillas, Timothy D. Barfoot, Marco Hutter

Figure 1 for Continuous-Time State Estimation Methods in Robotics: A Survey

Figure 2 for Continuous-Time State Estimation Methods in Robotics: A Survey

Figure 3 for Continuous-Time State Estimation Methods in Robotics: A Survey

Figure 4 for Continuous-Time State Estimation Methods in Robotics: A Survey

Abstract:Accurate, efficient, and robust state estimation is more important than ever in robotics as the variety of platforms and complexity of tasks continue to grow. Historically, discrete-time filters and smoothers have been the dominant approach, in which the estimated variables are states at discrete sample times. The paradigm of continuous-time state estimation proposes an alternative strategy by estimating variables that express the state as a continuous function of time, which can be evaluated at any query time. Not only can this benefit downstream tasks such as planning and control, but it also significantly increases estimator performance and flexibility, as well as reduces sensor preprocessing and interfacing complexity. Despite this, continuous-time methods remain underutilized, potentially because they are less well-known within robotics. To remedy this, this work presents a unifying formulation of these methods and the most exhaustive literature review to date, systematically categorizing prior work by methodology, application, state variables, historical context, and theoretical contribution to the field. By surveying splines and Gaussian processes together and contextualizing works from other research domains, this work identifies and analyzes open problems in continuous-time state estimation and suggests new research directions.

* Submitted to IEEE Transactions on Robotics (T-RO)

Via

Access Paper or Ask Questions

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Oct 07, 2024

Filippo A. Spinelli, Pascal Egli, Julian Nubert, Fang Nan, Thilo Bleumer, Patrick Goegler, Stephan Brockes, Ferdinand Hofmann, Marco Hutter

Figure 1 for Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Figure 2 for Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Figure 3 for Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Figure 4 for Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Abstract:The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.

* Presented at IROS 2024, Abu Dhabi, as oral presentation

Via

Access Paper or Ask Questions

ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Oct 05, 2024

Lorenzo Terenzi, Julian Nubert, Pol Eyschen, Pascal Roth, Simin Fei, Edo Jelavic, Marco Hutter

Figure 1 for ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Figure 2 for ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Figure 3 for ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Figure 4 for ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Abstract:Construction sites are challenging environments for autonomous systems due to their unstructured nature and the presence of dynamic actors, such as workers and machinery. This work presents a comprehensive panoptic scene understanding solution designed to handle the complexities of such environments by integrating 2D panoptic segmentation with 3D LiDAR mapping. Our system generates detailed environmental representations in real-time by combining semantic and geometric data, supported by Kalman Filter-based tracking for dynamic object detection. We introduce a fine-tuning method that adapts large pre-trained panoptic segmentation models for construction site applications using a limited number of domain-specific samples. For this use case, we release a first-of-its-kind dataset of 502 hand-labeled sample images with panoptic annotations from construction sites. In addition, we propose a dynamic panoptic mapping technique that enhances scene understanding in unstructured environments. As a case study, we demonstrate the system's application for autonomous navigation, utilizing real-time RRT* for reactive path planning in dynamic scenarios. The dataset (https://leggedrobotics.github.io/panoptic-scene-understanding.github.io/) and code (https://github.com/leggedrobotics/rsl_panoptic_mapping) for training and deployment are publicly available to support future research.

* 9 pages, 7 figures, 4 tables, submitted to 2024 Australasian Conference on Robotics and Automation (ACRA 2024)

Via

Access Paper or Ask Questions

Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Aug 21, 2024

Turcan Tuna, Julian Nubert, Patrick Pfreundschuh, Cesar Cadena, Shehryar Khattak, Marco Hutter

Figure 1 for Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Figure 2 for Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Figure 3 for Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Figure 4 for Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Abstract:The ICP registration algorithm has been a preferred method for LiDAR-based robot localization for nearly a decade. However, even in modern SLAM solutions, ICP can degrade and become unreliable in geometrically ill-conditioned environments. Current solutions primarily focus on utilizing additional sources of information, such as external odometry, to either replace the degenerate directions of the optimization solution or add additional constraints in a sensor-fusion setup afterward. In response, this work investigates and compares new and existing degeneracy mitigation methods for robust LiDAR-based localization and analyzes the efficacy of these approaches in degenerate environments for the first time in the literature at this scale. Specifically, this work proposes and investigates i) the incorporation of different types of constraints into the ICP algorithm, ii) the effect of using active or passive degeneracy mitigation techniques, and iii) the choice of utilizing global point cloud registration methods on the ill-conditioned ICP problem in LiDAR degenerate environments. The study results are validated through multiple real-world field and simulated experiments. The analysis shows that active optimization degeneracy mitigation is necessary and advantageous in the absence of reliable external estimate assistance for LiDAR-SLAM. Furthermore, introducing degeneracy-aware hard constraints in the optimization before or during the optimization is shown to perform better in the wild than by including the constraints after. Moreover, with heuristic fine-tuned parameters, soft constraints can provide equal or better results in complex ill-conditioned scenarios. The implementations used in the analysis of this work are made publicly available to the community.

* Submitted to IEEE Transactions on Field Robotics

Via

Access Paper or Ask Questions

MultiViPerFrOG: A Globally Optimized Multi-Viewpoint Perception Framework for Camera Motion and Tissue Deformation

Aug 08, 2024

Guido Caccianiga, Julian Nubert, Cesar Cadena, Marco Hutter, Katherine J. Kuchenbecker

Abstract:Reconstructing the 3D shape of a deformable environment from the information captured by a moving depth camera is highly relevant to surgery. The underlying challenge is the fact that simultaneously estimating camera motion and tissue deformation in a fully deformable scene is an ill-posed problem, especially from a single arbitrarily moving viewpoint. Current solutions are often organ-specific and lack the robustness required to handle large deformations. Here we propose a multi-viewpoint global optimization framework that can flexibly integrate the output of low-level perception modules (data association, depth, and relative scene flow) with kinematic and scene-modeling priors to jointly estimate multiple camera motions and absolute scene flow. We use simulated noisy data to show three practical examples that successfully constrain the convergence to a unique solution. Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds. MultiViPerFrOG builds a generalized learning-free scaffolding for spatio-temporal encoding that can unlock advanced surgical scene representations and will facilitate the development of the computer-assisted-surgery technologies of the future.

Via

Access Paper or Ask Questions

RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

Mar 03, 2024

Jonas Frey, Shehryar Khattak, Manthan Patel, Deegan Atha, Julian Nubert, Curtis Padgett, Marco Hutter, Patrick Spieler

Figure 1 for RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

Figure 2 for RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

Figure 3 for RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

Figure 4 for RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

Abstract:Autonomous navigation at high speeds in off-road environments necessitates robots to comprehensively understand their surroundings using onboard sensing only. The extreme conditions posed by the off-road setting can cause degraded camera image quality due to poor lighting and motion blur, as well as limited sparse geometric information available from LiDAR sensing when driving at high speeds. In this work, we present RoadRunner, a novel framework capable of predicting terrain traversability and an elevation map directly from camera and LiDAR sensor inputs. RoadRunner enables reliable autonomous navigation, by fusing sensory information, handling of uncertainty, and generation of contextually informed predictions about the geometry and traversability of the terrain while operating at low latency. In contrast to existing methods relying on classifying handcrafted semantic classes and using heuristics to predict traversability costs, our method is trained end-to-end in a self-supervised fashion. The RoadRunner network architecture builds upon popular sensor fusion network architectures from the autonomous driving domain, which embed LiDAR and camera information into a common Bird's Eye View perspective. Training is enabled by utilizing an existing traversability estimation stack to generate training data in hindsight in a scalable manner from real-world off-road driving datasets. Furthermore, RoadRunner improves the system latency by a factor of roughly 4, from 500 ms to 140 ms, while improving the accuracy for traversability costs and elevation map predictions. We demonstrate the effectiveness of RoadRunner in enabling safe and reliable off-road navigation at high speeds in multiple real-world driving scenarios through unstructured desert environments.

* under review for Field Robotics

Via

Access Paper or Ask Questions

Dense 3D Reconstruction Through Lidar: A Comparative Study on Ex-vivo Porcine Tissue

Jan 19, 2024

Guido Caccianiga, Julian Nubert, Marco Hutter, Katherine J. Kuchenbecker

Abstract:New sensing technologies and more advanced processing algorithms are transforming computer-integrated surgery. While researchers are actively investigating depth sensing and 3D reconstruction for vision-based surgical assistance, it remains difficult to achieve real-time, accurate, and robust 3D representations of the abdominal cavity for minimally invasive surgery. Thus, this work uses quantitative testing on fresh ex-vivo porcine tissue to thoroughly characterize the quality with which a 3D laser-based time-of-flight sensor (lidar) can perform anatomical surface reconstruction. Ground-truth surface shapes are captured with a commercial laser scanner, and the resulting signed error fields are analyzed using rigorous statistical tools. When compared to modern learning-based stereo matching from endoscopic images, time-of-flight sensing demonstrates higher precision, lower processing delay, higher frame rate, and superior robustness against sensor distance and poor illumination. Furthermore, we report on the potential negative effect of near-infrared light penetration on the accuracy of lidar measurements across different tissue samples, identifying a significant measured depth offset for muscle in contrast to fat and liver. Our findings highlight the potential of lidar for intraoperative 3D perception and point toward new methods that combine complementary time-of-flight and spectral imaging.

Via

Access Paper or Ask Questions

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Oct 02, 2023

Pascal Roth, Julian Nubert, Fan Yang, Mayank Mittal, Marco Hutter

Figure 1 for ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Figure 2 for ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Figure 3 for ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Figure 4 for ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Abstract:Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.

Via

Access Paper or Ask Questions

X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Nov 30, 2022

Turcan Tuna, Julian Nubert, Yoshua Nava, Shehryar Khattak, Marco Hutter

Figure 1 for X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Figure 2 for X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Figure 3 for X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Figure 4 for X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Abstract:Modern robotic systems are required to operate in challenging environments, which demand reliable localization under challenging conditions. LiDAR-based localization methods, such as the Iterative Closest Point (ICP) algorithm, can suffer in geometrically uninformative environments that are known to deteriorate registration performance and push optimization toward divergence along weakly constrained directions. To overcome this issue, this work proposes i) a robust multi-category (non-)localizability detection module, and ii) a localizability-aware constrained ICP optimization module and couples both in a unified manner. The proposed localizability detection is achieved by utilizing the correspondences between the scan and the map to analyze the alignment strength against the principal directions of the optimization as part of its multi-category LiDAR localizability analysis. In the second part, this localizability analysis is then tightly integrated into the scan-to-map point cloud registration to generate drift-free pose updates along well-constrained directions. The proposed method is thoroughly evaluated and compared to state-of-the-art methods in simulation and during real-world experiments, underlying the gain in performance and reliability in LiDAR-challenging scenarios. In all experiments, the proposed framework demonstrates accurate and generalizable localizability detection and robust pose estimation without environment-specific parameter tuning.

* 17 Pages, 17 Figures Submitted to IEEE Transactions On Robotics. Supplementary Video: https://www.youtube.com/watch?v=f-i-br-l5QA Project Website: https://sites.google.com/leggedrobotics.com/x-icp

Via

Access Paper or Ask Questions