Abstract:In recent years, advancements in optical tactile sensor technology have primarily centred on enhancing sensing precision and expanding the range of sensing modalities. To meet the requirements for more skilful manipulation, there should be a movement towards making tactile sensors more dynamic. In this paper, we introduce RoTip, a novel vision-based tactile sensor that is uniquely designed with an independently controlled joint and the capability to sense contact over its entire surface. The rotational capability of the sensor is particularly crucial for manipulating everyday objects, especially thin and flexible ones, as it enables the sensor to mobilize while in contact with the object's surface. The manipulation experiments demonstrate the ability of our proposed RoTip to manipulate rigid and flexible objects, and the full-finger tactile feedback and active rotation capabilities have the potential to explore more complex and precise manipulation tasks.
Abstract:Cross-Modal Retrieval (CMR), which retrieves relevant items from one modality (e.g., audio) given a query in another modality (e.g., visual), has undergone significant advancements in recent years. This capability is crucial for robots to integrate and interpret information across diverse sensory inputs. However, the retrieval space in existing robotic CMR approaches often consists of only one modality, which limits the robot's performance. In this paper, we propose a novel CMR model that incorporates three different modalities, i.e., visual, audio and tactile, for enhanced multi-modal object retrieval, named as VAT-CMR. In this model, multi-modal representations are first fused to provide a holistic view of object features. To mitigate the semantic gaps between representations of different modalities, a dominant modality is then selected during the classification training phase to improve the distinctiveness of the representations, so as to improve the retrieval performance. To evaluate our proposed approach, we conducted a case study and the results demonstrate that our VAT-CMR model surpasses competing approaches. Further, our proposed dominant modality selection significantly enhances cross-retrieval accuracy.
Abstract:Optical tactile sensors provide robots with rich force information for robot grasping in unstructured environments. The fast and accurate calibration of three-dimensional contact forces holds significance for new sensors and existing tactile sensors which may have incurred damage or aging. However, the conventional neural-network-based force calibration method necessitates a large volume of force-labeled tactile images to minimize force prediction errors, with the need for accurate Force/Torque measurement tools as well as a time-consuming data collection process. To address this challenge, we propose a novel deep domain-adaptation force calibration method, designed to transfer the force prediction ability from a calibrated optical tactile sensor to uncalibrated ones with various combinations of domain gaps, including marker presence, illumination condition, and elastomer modulus. Experimental results show the effectiveness of the proposed unsupervised force calibration method, with lowest force prediction errors of 0.102N (3.4\% in full force range) for normal force, and 0.095N (6.3\%) and 0.062N (4.1\%) for shear forces along the x-axis and y-axis, respectively. This study presents a promising, general force calibration methodology for optical tactile sensors.
Abstract:This paper introduces RoTipBot, a novel robotic system for handling thin, flexible objects. Different from previous works that are limited to singulating them using suction cups or soft grippers, RoTipBot can grasp and count multiple layers simultaneously, emulating human handling in various environments. Specifically, we develop a novel vision-based tactile sensor named RoTip that can rotate and sense contact information around its tip. Equipped with two RoTip sensors, RoTipBot feeds multiple layers of thin, flexible objects into the centre between its fingers, enabling effective grasping and counting. RoTip's tactile sensing ensures both fingers maintain good contact with the object, and an adjustment approach is designed to allow the gripper to adapt to changes in the object. Extensive experiments demonstrate the efficacy of the RoTip sensor and the RoTipBot approach. The results show that RoTipBot not only achieves a higher success rate but also grasps and counts multiple layers simultaneously -- capabilities not possible with previous methods. Furthermore, RoTipBot operates up to three times faster than state-of-the-art methods. The success of RoTipBot paves the way for future research in object manipulation using mobilised tactile sensors. All the materials used in this paper are available at \url{https://sites.google.com/view/rotipbot}.
Abstract:Pareto Set Learning (PSL) is an emerging research area in multi-objective optimization, focusing on training neural networks to learn the mapping from preference vectors to Pareto optimal solutions. However, existing PSL methods are limited to addressing a single Multi-objective Optimization Problem (MOP) at a time. When faced with multiple MOPs, this limitation not only leads to significant inefficiencies but also fails to exploit the potential synergies across varying MOPs. In this paper, we propose a Collaborative Pareto Set Learning (CoPSL) framework, which simultaneously learns the Pareto sets of multiple MOPs in a collaborative manner. CoPSL employs an architecture consisting of shared and MOP-specific layers, where shared layers aim to capture common relationships among MOPs collaboratively, and MOP-specific layers process these relationships to generate solution sets for each MOP. This collaborative approach enables CoPSL to efficiently learn the Pareto sets of multiple MOPs in a single run while leveraging the relationships among various MOPs. To further understand these relationships, we experimentally demonstrate that there exist shareable representations among MOPs. Leveraging these collaboratively shared representations can effectively improve the capability to approximate Pareto sets. Extensive experiments underscore the superior efficiency and robustness of CoPSL in approximating Pareto sets compared to state-of-the-art approaches on a variety of synthetic and real-world MOPs. Code is available at https://github.com/ckshang/CoPSL.
Abstract:Ensuring traffic safety is crucial, which necessitates the detection and prevention of road surface defects. As a result, there has been a growing interest in the literature on the subject, leading to the development of various road surface defect detection methods. The methods for detecting road defects can be categorised in various ways depending on the input data types or training methodologies. The predominant approach involves image-based methods, which analyse pixel intensities and surface textures to identify defects. Despite their popularity, image-based methods share the distinct limitation of vulnerability to weather and lighting changes. To address this issue, researchers have explored the use of additional sensors, such as laser scanners or LiDARs, providing explicit depth information to enable the detection of defects in terms of scale and volume. However, the exploration of data beyond images has not been sufficiently investigated. In this survey paper, we provide a comprehensive review of road surface defect detection studies, categorising them based on input data types and methodologies used. Additionally, we review recently proposed non-image-based methods and discuss several challenges and open problems associated with these techniques.
Abstract:Objectives: Artificial intelligence (AI) applications utilizing electronic health records (EHRs) have gained popularity, but they also introduce various types of bias. This study aims to systematically review the literature that address bias in AI research utilizing EHR data. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline. We retrieved articles published between January 1, 2010, and October 31, 2022, from PubMed, Web of Science, and the Institute of Electrical and Electronics Engineers. We defined six major types of bias and summarized the existing approaches in bias handling. Results: Out of the 252 retrieved articles, 20 met the inclusion criteria for the final review. Five out of six bias were covered in this review: eight studies analyzed selection bias; six on implicit bias; five on confounding bias; four on measurement bias; two on algorithmic bias. For bias handling approaches, ten studies identified bias during model development, while seventeen presented methods to mitigate the bias. Discussion: Bias may infiltrate the AI application development process at various stages. Although this review discusses methods for addressing bias at different development stages, there is room for implementing additional effective approaches. Conclusion: Despite growing attention to bias in healthcare AI, research using EHR data on this topic is still limited. Detecting and mitigating AI bias with EHR data continues to pose challenges. Further research is needed to raise a standardized method that is generalizable and interpretable to detect, mitigate and evaluate bias in medical AI.
Abstract:The missing signal caused by the objects being occluded or an unstable sensor is a common challenge during data collection. Such missing signals will adversely affect the results obtained from the data, and this issue is observed more frequently in robotic tactile perception. In tactile perception, due to the limited working space and the dynamic environment, the contact between the tactile sensor and the object is frequently insufficient and unstable, which causes the partial loss of signals, thus leading to incomplete tactile data. The tactile data will therefore contain fewer tactile cues with low information density. In this paper, we propose a tactile representation learning method, named TacMAE, based on Masked Autoencoder to address the problem of incomplete tactile data in tactile perception. In our framework, a portion of the tactile image is masked out to simulate the missing contact region. By reconstructing the missing signals in the tactile image, the trained model can achieve a high-level understanding of surface geometry and tactile properties from limited tactile cues. The experimental results of tactile texture recognition show that our proposed TacMAE can achieve a high recognition accuracy of 71.4% in the zero-shot transfer and 85.8% after fine-tuning, which are 15.2% and 8.2% higher than the results without using masked modeling. The extensive experiments on YCB objects demonstrate the knowledge transferability of our proposed method and the potential to improve efficiency in tactile exploration.
Abstract:We present a non-convex optimization algorithm metaheuristic, based on the training of a deep generative network, which enables effective searching within continuous, ultra-high dimensional landscapes. During network training, populations of sampled local gradients are utilized within a customized loss function to evolve the network output distribution function towards one peak at high-performing optima. The deep network architecture is tailored to support progressive growth over the course of training, which allows the algorithm to manage the curse of dimensionality characteristic of high-dimensional landscapes. We apply our concept to a range of standard optimization problems with dimensions as high as one thousand and show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm benchmarks. We also discuss the role of deep network over-parameterization, loss function engineering, and proper network architecture selection in optimization, and why the required batch size of sampled local gradients is independent of problem dimension. These concepts form the foundation for a new class of algorithms that utilize customizable and expressive deep generative networks to solve non-convex optimization problems.
Abstract:Tactile sensing plays an irreplaceable role in robotic material recognition. It enables robots to distinguish material properties such as their local geometry and textures, especially for materials like textiles. However, most tactile recognition methods can only classify known materials that have been touched and trained with tactile data, yet cannot classify unknown materials that are not trained with tactile data. To solve this problem, we propose a tactile zero-shot learning framework to recognise unknown materials when they are touched for the first time without requiring training tactile samples. The visual modality, providing tactile cues from sight, and semantic attributes, giving high-level characteristics, are combined together to bridge the gap between touched classes and untouched classes. A generative model is learnt to synthesise tactile features according to corresponding visual images and semantic embeddings, and then a classifier can be trained using the synthesised tactile features of untouched materials for zero-shot recognition. Extensive experiments demonstrate that our proposed multimodal generative model can achieve a high recognition accuracy of 83.06% in classifying materials that were not touched before. The robotic experiment demo and the dataset are available at https://sites.google.com/view/multimodalzsl.