Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francois Robert Hogan

Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Jun 17, 2025

Carolina Higuera, Akash Sharma, Taosha Fan, Chaithanya Krishna Bodduluri, Byron Boots, Michael Kaess, Mike Lambeta, Tingfan Wu, Zixi Liu, Francois Robert Hogan(+1 more)

Abstract:We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.

Via

Access Paper or Ask Questions

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Jun 11, 2025

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts(+20 more)

Abstract:A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.

* 48 pages, 19 figures

Via

Access Paper or Ask Questions

Self-supervised perception for tactile skin covered dexterous hands

May 16, 2025

Akash Sharma, Carolina Higuera, Chaithanya Krishna Bodduluri, Zixi Liu, Taosha Fan, Tess Hellebrekers, Mike Lambeta, Byron Boots, Michael Kaess, Tingfan Wu(+2 more)

Abstract:We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is crucial for robot dexterity. However, a lack of general-purpose models, challenges with interpreting magnetic flux and calibration have limited the adoption of these sensors. Sparsh-skin, given a history of kinematic and tactile sensing across a hand, outputs a latent tactile embedding that can be used in any downstream task. The encoder is self-supervised via self-distillation on a variety of unlabeled hand-object interactions using an Allegro hand sensorized with Xela uSkin. In experiments across several benchmark tasks, from state estimation to policy learning, we find that pretrained Sparsh-skin representations are both sample efficient in learning downstream tasks and improve task performance by over 41% compared to prior work and over 56% compared to end-to-end learning.

* 18 pages, 15 figures

Via

Access Paper or Ask Questions

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

Nov 28, 2022

Sahand Rezaei-Shoshtari, Charlotte Morissette, Francois Robert Hogan, Gregory Dudek, David Meger

Abstract:In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objective and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.

* AAAI 2023

Via

Access Paper or Ask Questions

Learning Intuitive Physics with Multimodal Generative Models

Jan 19, 2021

Sahand Rezaei-Shoshtari, Francois Robert Hogan, Michael Jenkin, David Meger, Gregory Dudek

Figure 1 for Learning Intuitive Physics with Multimodal Generative Models

Figure 2 for Learning Intuitive Physics with Multimodal Generative Models

Figure 3 for Learning Intuitive Physics with Multimodal Generative Models

Figure 4 for Learning Intuitive Physics with Multimodal Generative Models

Abstract:Predicting the future interaction of objects when they come into contact with their environment is key for autonomous agents to take intelligent and anticipatory actions. This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes. Visual information captures object properties such as 3D shape and location, while tactile information provides critical cues about interaction forces and resulting object motion when it makes contact with the environment. Utilizing a novel See-Through-your-Skin (STS) sensor that provides high resolution multimodal sensing of contact surfaces, our system captures both the visual appearance and the tactile properties of objects. We interpret the dual stream signals from the sensor using a Multimodal Variational Autoencoder (MVAE), allowing us to capture both modalities of contacting objects and to develop a mapping from visual to tactile interaction and vice-versa. Additionally, the perceptual system can be used to infer the outcome of future physical interactions, which we validate through simulated and real-world experiments in which the resting state of an object is predicted from given initial conditions.

* AAAI 2021

Via

Access Paper or Ask Questions

Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor

Dec 14, 2020

Francois Robert Hogan, Michael Jenkin, Sahand Rezaei-Shoshtari, Yogesh Girdhar, David Meger, Gregory Dudek

Figure 1 for Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor

Figure 2 for Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor

Figure 3 for Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor

Figure 4 for Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor

Abstract:We introduce a new class of vision-based sensor and associated algorithmic processes that combine visual imaging with high-resolution tactile sending, all in a uniform hardware and computational architecture. We demonstrate the sensor's efficacy for both multi-modal object recognition and metrology. Object recognition is typically formulated as an unimodal task, but by combining two sensor modalities we show that we can achieve several significant performance improvements. This sensor, named the See-Through-your-Skin sensor (STS), is designed to provide rich multi-modal sensing of contact surfaces. Inspired by recent developments in optical tactile sensing technology, we address a key missing feature of these sensors: the ability to capture a visual perspective of the region beyond the contact surface. Whereas optical tactile sensors are typically opaque, we present a sensor with a semitransparent skin that has the dual capabilities of acting as a tactile sensor and/or as a visual camera depending on its internal lighting conditions. This paper details the design of the sensor, showcases its dual sensing capabilities, and presents a deep learning architecture that fuses vision and touch. We validate the ability of the sensor to classify household objects, recognize fine textures, and infer their physical properties both through numerical simulations and experiments with a smart countertop prototype.

* A version of this paper appears in WACV 2021

Via

Access Paper or Ask Questions

Reactive Planar Manipulation with Convex Hybrid MPC

Sep 04, 2018

Francois Robert Hogan, Eudald Romo Grau, Alberto Rodriguez

Figure 1 for Reactive Planar Manipulation with Convex Hybrid MPC

Figure 2 for Reactive Planar Manipulation with Convex Hybrid MPC

Figure 3 for Reactive Planar Manipulation with Convex Hybrid MPC

Figure 4 for Reactive Planar Manipulation with Convex Hybrid MPC

Abstract:This paper presents a reactive controller for planar manipulation tasks that leverages machine learning to achieve real-time performance. The approach is based on a Model Predictive Control (MPC) formulation, where the goal is to find an optimal sequence of robot motions to achieve a desired object motion. Due to the multiple contact modes associated with frictional interactions, the resulting optimization program suffers from combinatorial complexity when tasked with determining the optimal sequence of modes. To overcome this difficulty, we formulate the search for the optimal mode sequences offline, separately from the search for optimal control inputs online. Using tools from machine learning, this leads to a convex hybrid MPC program that can be solved in real-time. We validate our algorithm on a planar manipulation experimental setup where results show that the convex hybrid MPC formulation with learned modes achieves good closed-loop performance on a trajectory tracking problem.

Via

Access Paper or Ask Questions

Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Nov 24, 2016

Francois Robert Hogan, Alberto Rodriguez

Figure 1 for Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Figure 2 for Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Figure 3 for Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Figure 4 for Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Abstract:This paper investigates real-time control strategies for dynamical systems that involve frictional contact interactions. Hybridness and underactuation are key characteristics of these systems that complicate the design of feedback controllers. In this research, we examine and test a novel feedback controller design on a planar pushing system, where the purpose is to control the motion of a sliding object on a flat surface using a point robotic pusher. The pusher-slider is a simple dynamical system that retains many of the challenges that are typical of robotic manipulation tasks. Our results show that a model predictive control approach used in tandem with integer programming offers a powerful solution to capture the dynamic constraints associated with the friction cone as well as the hybrid nature of the contact. In order to achieve real-time control, simplifications are proposed to speed up the integer program. The concept of Family of Modes (FOM) is introduced to solve an online convex optimization problem by selecting a set of contact mode schedules that spans a large set of dynamic behaviors that can occur during the prediction horizon. The controller design is applied to stabilize the motion of a sliding object about a nominal trajectory, and to re-plan its trajectory in real-time to follow a moving target. We validate the controller design through numerical simulations and experimental results on an industrial ABB IRB 120 robotic arm.

Via

Access Paper or Ask Questions