Abstract:In this paper, we address the Sim2Real gap in the field of vision-based tactile sensors for classifying object surfaces. We train a Diffusion Model to bridge this gap using a relatively small dataset of real-world images randomly collected from unlabeled everyday objects via the DIGIT sensor. Subsequently, we employ a simulator to generate images by uniformly sampling the surface of objects from the YCB Model Set. These simulated images are then translated into the real domain using the Diffusion Model and automatically labeled to train a classifier. During this training, we further align features of the two domains using an adversarial procedure. Our evaluation is conducted on a dataset of tactile images obtained from a set of ten 3D printed YCB objects. The results reveal a total accuracy of 81.9%, a significant improvement compared to the 34.7% achieved by the classifier trained solely on simulated images. This demonstrates the effectiveness of our approach. We further validate our approach using the classifier on a 6D object pose estimation task from tactile data.
Abstract:In this paper, we address the problem of estimating the in-hand 6D pose of an object in contact with multiple vision-based tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated object-agnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descent-based approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT vision-based sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual object-sensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.
Abstract:6D object pose tracking has been extensively studied in the robotics and computer vision communities. The most promising solutions, leveraging on deep neural networks and/or filtering and optimization, exhibit notable performance on standard benchmarks. However, to our best knowledge, these have not been tested thoroughly against fast object motions. Tracking performance in this scenario degrades significantly, especially for methods that do not achieve real-time performance and introduce non negligible delays. In this work, we introduce ROFT, a Kalman filtering approach for 6D object pose and velocity tracking from a stream of RGB-D images. By leveraging real-time optical flow, ROFT synchronizes delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation with the RGB-D input stream to achieve fast and precise 6D object pose and velocity tracking. We test our method on a newly introduced photorealistic dataset, Fast-YCB, which comprises fast moving objects from the YCB model set, and on the dataset for object and hand pose estimation HO-3D. Results demonstrate that our approach outperforms state-of-the-art methods for 6D object pose tracking, while also providing 6D object velocity tracking. A video showing the experiments is provided as supplementary material.