Abstract:This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the shared latent information across different sensor-task pairings by constructing a shared trunk transformer with sensor-specific encoders and task-specific decoders. The pre-training of T3 utilizes a novel Foundation Tactile (FoTa) dataset, which is aggregated from several open-sourced datasets and it contains over 3 million data points gathered from 13 sensors and 11 tasks. FoTa is the largest and most diverse dataset in tactile sensing to date and it is made publicly available in a unified format. Across various sensors and tasks, experiments show that T3 pre-trained with FoTa achieved zero-shot transferability in certain sensor-task pairings, can be further fine-tuned with small amounts of domain-specific data, and its performance scales with bigger network sizes. T3 is also effective as a tactile encoder for long horizon contact-rich manipulation. Results from sub-millimeter multi-pin electronics insertion tasks show that T3 achieved a task success rate 25% higher than that of policies trained with tactile encoders trained from scratch, or 53% higher than without tactile sensing. Data, code, and model checkpoints are open-sourced at https://t3.alanz.info.
Abstract:Many robotic hands currently rely on extremely dexterous robotic fingers and a thumb joint to envelop themselves around an object. Few hands focus on the palm even though human hands greatly benefit from their central fold and soft surface. As such, we develop a novel structurally compliant soft palm, which enables more surface area contact for the objects that are pressed into it. Moreover, this design, along with the development of a new low-cost, flexible illumination system, is able to incorporate a high-resolution tactile sensing system inspired by the GelSight sensors. Concurrently, we design RObotic Modular Endoskeleton Optical (ROMEO) fingers, which are underactuated two-segment soft fingers that are able to house the new illumination system, and we integrate them into these various palm configurations. The resulting robotic hand is slightly bigger than a baseball and represents one of the first soft robotic hands with actuated fingers and a passively compliant palm, all of which have high-resolution tactile sensing. This design also potentially helps researchers discover and explore more soft-rigid tactile robotic hand designs with greater capabilities in the future. The supplementary video can be found here: https://youtu.be/RKfIFiewqsg
Abstract:Compliant grippers enable robots to work with humans in unstructured environments. In general, these grippers can improve with tactile sensing to estimate the state of objects around them to precisely manipulate objects. However, co-designing compliant structures with high-resolution tactile sensing is a challenging task. We propose a simulation framework for the end-to-end forward design of GelSight Fin Ray sensors. Our simulation framework consists of mechanical simulation using the finite element method (FEM) and optical simulation including physically based rendering (PBR). To simulate the fluorescent paint used in these GelSight Fin Rays, we propose an efficient method that can be directly integrated in PBR. Using the simulation framework, we investigate design choices available in the compliant grippers, namely gel pad shapes, illumination conditions, Fin Ray gripper sizes, and Fin Ray stiffness. This infrastructure enables faster design and prototype time frames of new Fin Ray sensors that have various sensing areas, ranging from 48 mm $\times$ \18 mm to 70 mm $\times$ 35 mm. Given the parameters we choose, we can thus optimize different Fin Ray designs and show their utility in grasping day-to-day objects.
Abstract:Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .
Abstract:This paper presents GelSight Svelte Hand, a novel 3-finger 2-DoF tactile robotic hand that is capable of performing precision grasps, power grasps, and intermediate grasps. Rich tactile signals are obtained from one camera on each finger, with an extended sensing area similar to the full length of a human finger. Each finger of GelSight Svelte Hand is supported by a semi-rigid endoskeleton and covered with soft silicone materials, which provide both rigidity and compliance. We describe the design, fabrication, functionalities, and tactile sensing capability of GelSight Svelte Hand in this paper. More information is available on our website: \url{https://gelsight-svelte.alanz.info}.
Abstract:Camera-based tactile sensing is a low-cost, popular approach to obtain highly detailed contact geometry information. However, most existing camera-based tactile sensors are fingertip sensors, and longer fingers often require extraneous elements to obtain an extended sensing area similar to the full length of a human finger. Moreover, existing methods to estimate proprioceptive information such as total forces and torques applied on the finger from camera-based tactile sensors are not effective when the contact geometry is complex. We introduce GelSight Svelte, a curved, human finger-sized, single-camera tactile sensor that is capable of both tactile and proprioceptive sensing over a large area. GelSight Svelte uses curved mirrors to achieve the desired shape and sensing coverage. Proprioceptive information, such as the total bending and twisting torques applied on the finger, is reflected as deformations on the flexible backbone of GelSight Svelte, which are also captured by the camera. We train a convolutional neural network to estimate the bending and twisting torques from the captured images. We conduct gel deformation experiments at various locations of the finger to evaluate the tactile sensing capability and proprioceptive sensing accuracy. To demonstrate the capability and potential uses of GelSight Svelte, we conduct an object holding task with three different grasping modes that utilize different areas of the finger. More information is available on our website: https://gelsight-svelte.alanz.info
Abstract:Camera-based tactile sensors have shown great promise in enhancing a robot's ability to perform a variety of dexterous manipulation tasks. Advantages of their use can be attributed to the high resolution tactile data and 3D depth map reconstructions they can provide. Unfortunately, many of these tactile sensors use either a flat sensing surface, sense on only one side of the sensor's body, or have a bulky form-factor, making it difficult to integrate the sensors with a variety of robotic grippers. Of the camera-based sensors that do have all-around, curved sensing surfaces, many cannot provide 3D depth maps; those that do often require optical designs specified to a particular sensor geometry. In this work, we introduce GelSight360, a fingertip-like, omnidirectional, camera-based tactile sensor capable of producing depth maps of objects deforming the sensor's surface. In addition, we introduce a novel cross-LED lighting scheme that can be implemented in different all-around sensor geometries and sizes, allowing the sensor to easily be reconfigured and attached to different grippers of varying DOFs. With this work, we enable roboticists to quickly and easily customize high resolution tactile sensors to fit their robotic system's needs.
Abstract:We describe a novel three-finger robot hand that has high resolution tactile sensing along the entire length of each finger. The fingers are compliant, constructed with a soft shell supported with a flexible endoskeleton. Each finger contains two cameras, allowing tactile data to be gathered along the front and side surfaces of the fingers. The gripper can perform an enveloping grasp of an object and extract a large amount of rich tactile data in a single grasp. By capturing data from many parts of the grasped object at once, we can do object recognition with a single grasp rather than requiring multiple touches. We describe our novel design and construction techniques which allow us to simultaneously satisfy the requirements of compliance and strength, and high resolution tactile sensing over large areas. The supplementary video can be found here: https://youtu.be/H1OYADtgj9k
Abstract:The synthesis of tactile sensing with compliance is essential to many fields, from agricultural usages like fruit picking, to sustainability practices such as sorting recycling, to the creation of safe home-care robots for the elderly to age with dignity. From tactile sensing, we can discern material properties, recognize textures, and determine softness, while with compliance, we are able to securely and safely interact with the objects and the environment around us. These two abilities can culminate into a useful soft robotic gripper, such as the original GelSight Fin Ray, which is able to grasp a large variety of different objects and also perform a simple household manipulation task: wine glass reorientation. Although the original GelSight Fin Ray solves the problem of interfacing a generally rigid, high-resolution sensor with a soft, compliant structure, we can improve the robustness of the sensor and implement techniques that make such camera-based tactile sensors applicable to a wider variety of soft robot designs. We first integrate flexible mirrors and incorporate the rigid electronic components into the base of the gripper, which greatly improves the compliance of the Fin Ray structure. Then, we synthesize a flexible and high-elongation silicone adhesive-based fluorescent paint, which can provide good quality 2D tactile localization results for our sensor. Finally, we incorporate all of these techniques into a new design: the Baby Fin Ray, which we use to dig through clutter, and perform successful classification of nuts in their shells. The supplementary video can be found here: https://youtu.be/_oD_QFtYTPM
Abstract:In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object. We also design a loop closure mechanism that actively matches current vision and tactile images to previously stored key-frames to reduce accumulated error. FingerSLAM incorporates the two sensing modalities of tactile and vision, as well as the loop closure mechanism with a factor graph-based optimization framework. Such a framework produces an optimized pose estimation solution that is more accurate than the standalone estimators. The estimated poses are then used to reconstruct the shape of the unknown object incrementally by stitching the local point clouds recovered from tactile images. We train our system on real-world data collected with 20 objects. We demonstrate reliable visuo-tactile pose estimation and shape reconstruction through quantitative and qualitative real-world evaluations on 6 objects that are unseen during training.