Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Kaess

Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Jun 17, 2025

Carolina Higuera, Akash Sharma, Taosha Fan, Chaithanya Krishna Bodduluri, Byron Boots, Michael Kaess, Mike Lambeta, Tingfan Wu, Zixi Liu, Francois Robert Hogan(+1 more)

Abstract:We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.

Via

Access Paper or Ask Questions

Self-supervised perception for tactile skin covered dexterous hands

May 16, 2025

Akash Sharma, Carolina Higuera, Chaithanya Krishna Bodduluri, Zixi Liu, Taosha Fan, Tess Hellebrekers, Mike Lambeta, Byron Boots, Michael Kaess, Tingfan Wu(+2 more)

Abstract:We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is crucial for robot dexterity. However, a lack of general-purpose models, challenges with interpreting magnetic flux and calibration have limited the adoption of these sensors. Sparsh-skin, given a history of kinematic and tactile sensing across a hand, outputs a latent tactile embedding that can be used in any downstream task. The encoder is self-supervised via self-distillation on a variety of unlabeled hand-object interactions using an Allegro hand sensorized with Xela uSkin. In experiments across several benchmark tasks, from state estimation to policy learning, we find that pretrained Sparsh-skin representations are both sample efficient in learning downstream tasks and improve task performance by over 41% compared to prior work and over 56% compared to end-to-end learning.

* 18 pages, 15 figures

Via

Access Paper or Ask Questions

Acoustic Neural 3D Reconstruction Under Pose Drift

Mar 11, 2025

Tianxiang Lin, Mohamad Qadri, Kevin Zhang, Adithya Pediredla, Christopher A. Metzler, Michael Kaess

Abstract:We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D acoustic modeling algorithms is highly dependent on accurate pose estimation; small errors in sensor pose can lead to severe reconstruction artifacts. In this paper, we propose an algorithm that jointly optimizes the neural scene representation and sonar poses. Our algorithm does so by parameterizing the 6DoF poses as learnable parameters and backpropagating gradients through the neural renderer and implicit representation. We validated our algorithm on both real and simulated datasets. It produces high-fidelity 3D reconstructions even under significant pose drift.

* 8 pages, 8 figures. This paper is under review

Via

Access Paper or Ask Questions

Your Learned Constraint is Secretly a Backward Reachable Tube

Jan 26, 2025

Mohamad Qadri, Gokul Swamy, Jonathan Francis, Michael Kaess, Andrea Bajcsy

Abstract:Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors

Dec 12, 2024

Hung-Jui Huang, Michael Kaess, Wenzhen Yuan

Abstract:Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D reconstruction. In this work, we introduce NormalFlow, a fast, robust, and real-time tactile-based 6DoF tracking algorithm. Leveraging the precise surface normal estimation of vision-based tactile sensors, NormalFlow determines object movements by minimizing discrepancies between the tactile-derived surface normals. Our results show that NormalFlow consistently outperforms competitive baselines and can track low-texture objects like table surfaces. For long-horizon tracking, we demonstrate when rolling the sensor around a bead for 360 degrees, NormalFlow maintains a rotational tracking error of 2.5 degrees. Additionally, we present state-of-the-art tactile-based 3D reconstruction results, showcasing the high accuracy of NormalFlow. We believe NormalFlow unlocks new possibilities for high-precision perception and manipulation tasks that involve interacting with objects using hands. The video demo, code, and dataset are available on our website: https://joehjhuang.github.io/normalflow.

* IEEE Robotics and Automation Letters ( Volume: 10, Issue: 1, January 2025)
* 8 pages, published in 2024 RA-L, website link: https://joehjhuang.github.io/normalflow

Via

Access Paper or Ask Questions

Sparsh: Self-supervised touch representations for vision-based tactile sensing

Oct 31, 2024

Carolina Higuera, Akash Sharma, Chaithanya Krishna Bodduluri, Taosha Fan, Patrick Lancaster, Mrinal Kalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu(+1 more)

Abstract:In this work, we introduce general purpose touch representations for the increasingly accessible class of vision-based tactile sensors. Such sensors have led to many recent advances in robot manipulation as they markedly complement vision, yet solutions today often rely on task and sensor specific handcrafted perception models. Collecting real data at scale with task centric ground truth labels, like contact forces and slip, is a challenge further compounded by sensors of various form factor differing in aspects like lighting and gel markings. To tackle this we turn to self-supervised learning (SSL) that has demonstrated remarkable performance in computer vision. We present Sparsh, a family of SSL models that can support various vision-based tactile sensors, alleviating the need for custom labels through pre-training on 460k+ tactile images with masking and self-distillation in pixel and latent spaces. We also build TacBench, to facilitate standardized benchmarking across sensors and models, comprising of six tasks ranging from comprehending tactile properties to enabling physical perception and manipulation planning. In evaluations, we find that SSL pre-training for touch representation outperforms task and sensor-specific end-to-end training by 95.1% on average over TacBench, and Sparsh (DINO) and Sparsh (IJEPA) are the most competitive, indicating the merits of learning in latent space for tactile images. Project page: https://sparsh-ssl.github.io/

* Conference on Robot Learning (CoRL), 2024

Via

Access Paper or Ask Questions

LiPO: LiDAR Inertial Odometry for ICP Comparison

Oct 10, 2024

Darwin Mick, Taylor Pool, Madankumar Sathenahally Nagaraju, Michael Kaess, Howie Choset, Matt Travers

Figure 1 for LiPO: LiDAR Inertial Odometry for ICP Comparison

Figure 2 for LiPO: LiDAR Inertial Odometry for ICP Comparison

Figure 3 for LiPO: LiDAR Inertial Odometry for ICP Comparison

Figure 4 for LiPO: LiDAR Inertial Odometry for ICP Comparison

Abstract:We introduce a LiDAR inertial odometry (LIO) framework, called LiPO, that enables direct comparisons of different iterative closest point (ICP) point cloud registration methods. The two common ICP methods we compare are point-to-point (P2P) and point-to-feature (P2F). In our experience, within the context of LIO, P2F-ICP results in less drift and improved mapping accuracy when robots move aggressively through challenging environments when compared to P2P-ICP. However, P2F-ICP methods require more hand-tuned hyper-parameters that make P2F-ICP less general across all environments and motions. In real-world field robotics applications where robots are used across different environments, more general P2P-ICP methods may be preferred despite increased drift. In this paper, we seek to better quantify the trade-off between P2P-ICP and P2F-ICP to help inform when each method should be used. To explore this trade-off, we use LiPO to directly compare ICP methods and test on relevant benchmark datasets as well as on our custom unpiloted ground vehicle (UGV). We find that overall, P2F-ICP has reduced drift and improved mapping accuracy, but, P2P-ICP is more consistent across all environments and motions with minimal drift increase.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions

BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis

Oct 08, 2024

Christopher Klammer, Michael Kaess

Figure 1 for BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis

Figure 2 for BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis

Figure 3 for BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis

Figure 4 for BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis

Abstract:Ground to aerial matching is a crucial and challenging task in outdoor robotics, particularly when GPS is absent or unreliable. Structures like buildings or large dense forests create interference, requiring GNSS replacements for global positioning estimates. The true difficulty lies in reconciling the perspective difference between the ground and air images for acceptable localization. Taking inspiration from the autonomous driving community, we propose a novel framework for synthesizing a birds-eye-view (BEV) scene representation to match and localize against an aerial map in off-road environments. We leverage contrastive learning with domain specific hard negative mining to train a network to learn similar representations between the synthesized BEV and the aerial map. During inference, BEVLoc guides the identification of the most probable locations within the aerial map through a coarse-to-fine matching strategy. Our results demonstrate promising initial outcomes in extremely difficult forest environments with limited semantic diversity. We analyze our model's performance for coarse and fine matching, assessing both the raw matching capability of our model and its performance as a GNSS replacement. Our work delves into off-road map localization while establishing a foundational baseline for future developments in localization. Our code is available at: https://github.com/rpl-cmu/bevloc

* 8 pages, 6 figures, Conference: IROS 2024

Via

Access Paper or Ask Questions

iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping

Jun 11, 2024

Daniel McGann, Michael Kaess

Figure 1 for iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping

Figure 2 for iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping

Figure 3 for iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping

Figure 4 for iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping

Abstract:This paper introduces a novel incremental distributed back-end algorithm for Collaborative Simultaneous Localization and Mapping (C-SLAM). For real-world deployments, robotic teams require algorithms to compute a consistent state estimate accurately, within online runtime constraints, and with potentially limited communication. Existing centralized, decentralized, and distributed approaches to solving C-SLAM problems struggle to achieve all of these goals. To address this capability gap, we present Incremental Manifold Edge-based Separable ADMM (iMESA) a fully distributed C-SLAM back-end algorithm that can provide a multi-robot team with accurate state estimates in real-time with only sparse pair-wise communication between robots. Extensive evaluation on real and synthetic data demonstrates that iMESA is able to outperform comparable state-of-the-art C-SLAM back-ends.

* Accepted to Robotic Science and Systems (RSS) 2024 in Delft, Netherlands

Via

Access Paper or Ask Questions

A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

May 26, 2024

Moshe Shienman, Ohad Levy-Or, Michael Kaess, Vadim Indelman

Figure 1 for A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

Figure 2 for A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

Figure 3 for A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

Figure 4 for A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

Abstract:We introduce an innovative method for incremental nonparametric probabilistic inference in high-dimensional state spaces. Our approach leverages \slices from high-dimensional surfaces to efficiently approximate posterior distributions of any shape. Unlike many existing graph-based methods, our \slices perspective eliminates the need for additional intermediate reconstructions, maintaining a more accurate representation of posterior distributions. Additionally, we propose a novel heuristic to balance between accuracy and efficiency, enabling real-time operation in nonparametric scenarios. In empirical evaluations on synthetic and real-world datasets, our \slices approach consistently outperforms other state-of-the-art methods. It demonstrates superior accuracy and achieves a significant reduction in computational complexity, often by an order of magnitude.

* 8 Pages, 7 figures, Submitted to IEEE IROS 2024

Via

Access Paper or Ask Questions