Abstract:This paper reports on learning a reward map for social navigation in dynamic environments where the robot can reason about its path at any time, given agents' trajectories and scene geometry. Humans navigating in dense and dynamic indoor environments often work with several implied social rules. A rule-based approach fails to model all possible interactions between humans, robots, and scenes. We propose a novel Smooth Maximum Entropy Deep Inverse Reinforcement Learning (S-MEDIRL) algorithm that can extrapolate beyond expert demos to better encode scene navigability from few-shot demonstrations. The agent learns to predict the cost maps reasoning on trajectory data and scene geometry. The agent samples a trajectory that is then executed using a local crowd navigation controller. We present results in a photo-realistic simulation environment, with a robot and a human navigating a narrow crossing scenario. The robot implicitly learns to exhibit social behaviors such as yielding to oncoming traffic and avoiding deadlocks. We compare the proposed approach to the popular model-based crowd navigation algorithm ORCA and a rule-based agent that exhibits yielding.
Abstract:In this paper, we present a novel algorithm for probabilistically updating and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although previous methods have introduced algorithms which learn to rasterize features in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which presents a challenge for safety-critical robotic applications. To address this gap, we propose a method which advances the literature of continuous semantic mapping from voxels to ellipsoids, combining the precise structure of 3D-GS with the ability to quantify uncertainty of probabilistic robotic maps. Given a set of images, our algorithm performs a probabilistic semantic update directly on the 3D ellipsoids to obtain an expectation and variance through the use of conjugate priors. We also propose a probabilistic rasterization which returns per-pixel segmentation predictions with quantifiable uncertainty. We compare our method with similar probabilistic voxel-based methods to verify our extension to 3D ellipsoids, and perform ablation studies on uncertainty quantification and temporal smoothing.
Abstract:Tensegrity robots, characterized by a synergistic assembly of rigid rods and elastic cables, form robust structures that are resistant to impacts. However, this design introduces complexities in kinematics and dynamics, complicating control and state estimation. This work presents a novel proprioceptive state estimator for tensegrity robots. The estimator initially uses the geometric constraints of 3-bar prism tensegrity structures, combined with IMU and motor encoder measurements, to reconstruct the robot's shape and orientation. It then employs a contact-aided invariant extended Kalman filter with forward kinematics to estimate the global position and orientation of the tensegrity robot. The state estimator's accuracy is assessed against ground truth data in both simulated environments and real-world tensegrity robot applications. It achieves an average drift percentage of 4.2%, comparable to the state estimation performance of traditional rigid robots. This state estimator advances the state of the art in tensegrity robot state estimation and has the potential to run in real-time using onboard sensors, paving the way for full autonomy of tensegrity robots in unstructured environments.
Abstract:This paper introduces a novel probabilistic mapping algorithm, Latent BKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. Latent BKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). Latent BKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular MatterPort-3D and Semantic KITTI data sets, demonstrating that Latent BKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.
Abstract:This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) that integrates and synchronizes skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable uncertainty. Existing methods for motion forecasting or motion generation rely solely on either prior motions or text prompts, facing limitations with precision or control, particularly over extended durations. The multi-modal nature of our approach enhances the contextual understanding of human motion, while our graph-based transformer framework effectively capture both spatial and temporal motion dynamics. As a result, our model consistently outperforms existing generative techniques in accurately predicting long-term motions. Additionally, by leveraging diffusion models' ability to capture different modes of prediction, we estimate uncertainty, significantly improving spatial awareness in human-robot interactions by incorporating zones of presence with varying confidence levels for each body joint.
Abstract:This paper addresses a special Perspective-n-Point (PnP) problem: estimating the optimal pose to align 3D and 2D shapes in real-time without correspondences, termed as correspondence-free PnP. While several studies have focused on 3D and 2D shape registration, achieving both real-time and accurate performance remains challenging. This study specifically targets the 3D-2D geometric shape registration tasks, applying the recently developed Reproducing Kernel Hilbert Space (RKHS) to address the "big-to-small" issue. An iterative reweighted least squares method is employed to solve the RKHS-based formulation efficiently. Moreover, our work identifies a unique and interesting observability issue in correspondence-free PnP: the numerical ambiguity between rotation and translation. To address this, we proposed DynaWeightPnP, introducing a dynamic weighting sub-problem and an alternative searching algorithm designed to enhance pose estimation and alignment accuracy. Experiments were conducted on a typical case, that is, a 3D-2D vascular centerline registration task within Endovascular Image-Guided Interventions (EIGIs). Results demonstrated that the proposed algorithm achieves registration processing rates of 60 Hz (without post-refinement) and 31 Hz (with post-refinement) on modern single-core CPUs, with competitive accuracy comparable to existing methods. These results underscore the suitability of DynaWeightPnP for future robot navigation tasks like EIGIs.
Abstract:A major limitation of minimally invasive surgery is the difficulty in accurately locating the internal anatomical structures of the target organ due to the lack of tactile feedback and transparency. Augmented reality (AR) offers a promising solution to overcome this challenge. Numerous studies have shown that combining learning-based and geometric methods can achieve accurate preoperative and intraoperative data registration. This work proposes a real-time monocular 3D tracking algorithm for post-registration tasks. The ORB-SLAM2 framework is adopted and modified for prior-based 3D tracking. The primitive 3D shape is used for fast initialization of the monocular SLAM. A pseudo-segmentation strategy is employed to separate the target organ from the background for tracking purposes, and the geometric prior of the 3D shape is incorporated as an additional constraint in the pose graph. Experiments from in-vivo and ex-vivo tests demonstrate that the proposed 3D tracking system provides robust 3D tracking and effectively handles typical challenges such as fast motion, out-of-field-of-view scenarios, partial visibility, and "organ-background" relative motion.
Abstract:This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers, and asymmetrical data. An unsupervised training approach is introduced to effectively handle limited ground truth data, facilitating adaptation to real datasets. The proposed method outperforms classical and supervised methods in terms of registration accuracy on both synthetic (ModelNet40) and real-world (ETH3D) noisy, outlier-rich datasets. To our best knowledge, this marks the first instance of successful real RGB-D odometry data registration using an equivariant method. The code is available at {https://sites.google.com/view/eccv24-equivalign}
Abstract:Partial point cloud registration is a challenging problem in robotics, especially when the robot undergoes a large transformation, causing a significant initial pose error and a low overlap between measurements. This work proposes exploiting equivariant learning from 3D point clouds to improve registration robustness. We propose SE3ET, an SE(3)-equivariant registration framework that employs equivariant point convolution and equivariant transformer designs to learn expressive and robust geometric features. We tested the proposed registration method on indoor and outdoor benchmarks where the point clouds are under arbitrary transformations and low overlapping ratios. We also provide generalization tests and run-time performance.
Abstract:This study addresses the challenge of integrating social norms into robot navigation, which is essential for ensuring that robots operate safely and efficiently in human-centric environments. Social norms, often unspoken and implicitly understood among people, are difficult to explicitly define and implement in robotic systems. To overcome this, we derive these norms from real human trajectory data, utilizing the comprehensive ATC dataset to identify the minimum social zones humans and robots must respect. These zones are integrated into the robot' navigation system by applying barrier functions, ensuring the robot consistently remains within the designated safety set. Simulation results demonstrate that our system effectively mimics human-like navigation strategies, such as passing on the right side and adjusting speed or pausing in constrained spaces. The proposed framework is versatile, easily comprehensible, and tunable, demonstrating the potential to advance the development of robots designed to navigate effectively in human-centric environments.