Dept. of Electronics, Information, and Bioengineering, Politecnico di Milano, Italy
Abstract:Despite impressive advancements of industrial collaborative robots, their potential remains largely untapped due to the difficulty in balancing human safety and comfort with fast production constraints. To help address this challenge, we present PRO-MIND, a novel human-in-the-loop framework that leverages valuable data about the human co-worker to optimise robot trajectories. By estimating human attention and mental effort, our method dynamically adjusts safety zones and enables on-the-fly alterations of the robot path to enhance human comfort and optimal stopping conditions. Moreover, we formulate a multi-objective optimisation to adapt the robot's trajectory execution time and smoothness based on the current human psycho-physical stress, estimated from heart rate variability and frantic movements. These adaptations exploit the properties of B-spline curves to preserve continuity and smoothness, which are crucial factors in improving motion predictability and comfort. Evaluation in two realistic case studies showcases the framework's ability to restrain the operators' workload and stress and to ensure their safety while enhancing human-robot productivity. Further strengths of PRO-MIND include its adaptability to each individual's specific needs and sensitivity to variations in attention, mental effort, and stress during task execution.
Abstract:Robotic-assisted surgery (RAS) relies on accurate depth estimation for 3D reconstruction and visualization. While foundation models like Depth Anything Models (DAM) show promise, directly applying them to surgery often yields suboptimal results. Fully fine-tuning on limited surgical data can cause overfitting and catastrophic forgetting, compromising model robustness and generalization. Although Low-Rank Adaptation (LoRA) addresses some adaptation issues, its uniform parameter distribution neglects the inherent feature hierarchy, where earlier layers, learning more general features, require more parameters than later ones. To tackle this issue, we introduce Depth Anything in Robotic Endoscopic Surgery (DARES), a novel approach that employs a new adaptation technique, Vector Low-Rank Adaptation (Vector-LoRA) on the DAM V2 to perform self-supervised monocular depth estimation in RAS scenes. To enhance learning efficiency, we introduce Vector-LoRA by integrating more parameters in earlier layers and gradually decreasing parameters in later layers. We also design a reprojection loss based on the multi-scale SSIM error to enhance depth perception by better tailoring the foundation model to the specific requirements of the surgical environment. The proposed method is validated on the SCARED dataset and demonstrates superior performance over recent state-of-the-art self-supervised monocular depth estimation techniques, achieving an improvement of 13.3% in the absolute relative error metric. The code and pre-trained weights are available at https://github.com/mobarakol/DARES.
Abstract:This letter introduces an innovative visuo-haptic interface to control Mobile Collaborative Robots (MCR). Thanks to a passive detachable mechanism, the interface can be attached/detached from a robot, offering two control modes: local control (attached) and teleoperation (detached). These modes are integrated with a robot whole-body controller and presented in a unified close- and far-proximity control framework for MCR. The earlier introduction of the haptic component in this interface enabled users to execute intricate loco-manipulation tasks via admittance-type control, effectively decoupling task dynamics and enhancing human capabilities. In contrast, this ongoing work proposes a novel design that integrates a visual component. This design utilizes Visual-Inertial Odometry (VIO) for teleoperation, estimating the interface's pose through stereo cameras and an Inertial Measurement Unit (IMU). The estimated pose serves as the reference for the robot's end-effector in teleoperation mode. Hence, the interface offers complete flexibility and adaptability, enabling any user to operate an MCR seamlessly without needing expert knowledge. In this letter, we primarily focus on the new visual feature, and first present a performance evaluation of different VIO-based methods for teleoperation. Next, the interface's usability is analyzed in a home-care application and compared to an alternative designed by a commercial MoCap system. Results show comparable performance in terms of accuracy, completion time, and usability. Nevertheless, the proposed interface is low-cost, poses minimal wearability constraints, and can be used anywhere and anytime without needing external devices or additional equipment, offering a versatile and accessible solution for teleoperation.
Abstract:Humans' ability to smoothly switch between locomotion and manipulation is a remarkable feature of sensorimotor coordination. Leaning and replication of such human-like strategies can lead to the development of more sophisticated robots capable of performing complex whole-body tasks in real-world environments. To this end, this paper proposes a combined learning and optimization framework for transferring human's loco-manipulation soft-switching skills to mobile manipulators. The methodology departs from data collection of human demonstrations for a locomotion-integrated manipulation task through a vision system. Next, the wrist and pelvis motions are mapped to mobile manipulators' End-Effector (EE) and mobile base. A kernelized movement primitive algorithm learns the wrist and pelvis trajectories and generalizes to new desired points according to task requirements. Next, the reference trajectories are sent to a hierarchical quadratic programming controller, where the EE and the mobile base reference trajectories are provided as the first and second priority tasks, generating the feasible and optimal joint level commands. A locomotion-integrated pick-and-place task is executed to validate the proposed approach. After a human demonstrates the task, a mobile manipulator executes the task with the same and new settings, grasping a bottle at non-zero velocity. The results showed that the proposed approach successfully transfers the human loco-manipulation skills to mobile manipulators, even with different geometry.
Abstract:Navigation through tortuous and deformable vessels using catheters with limited steering capability underscores the need for reliable path planning. State-of-the-art path planners do not fully account for the deformable nature of the environment. This work proposes a robust path planner via a learning from demonstrations method, named Curriculum Generative Adversarial Imitation Learning (C-GAIL). This path planning framework takes into account the interaction between steerable catheters and vessel walls and the deformable property of vessels. In-silico comparative experiments show that the proposed network achieves smaller targeting errors, and a higher success rate, compared to a state-of-the-art approach based on GAIL. The in-vitro validation experiments demonstrate that the path generated by the proposed C-GAIL path planner aligns better with the actual steering capability of the pneumatic artificial muscle-driven catheter utilized in this study. Therefore, the proposed approach can provide enhanced support to the user in navigating the catheter towards the target with greater precision, in contrast to the conventional centerline-following technique. The targeting and tracking errors are 1.26$\pm$0.55mm and 5.18$\pm$3.48mm, respectively. The proposed path planning framework exhibits superior performance in managing uncertainty associated with vessel deformation, thereby resulting in lower tracking errors.
Abstract:During Percutaneous Nephrolithotomy (PCNL) operations, the surgeon is required to define the incision point on the patient's back, align the needle to a pre-planned path, and perform puncture operations afterward. The procedure is currently performed manually using ultrasound or fluoroscopy imaging for needle orientation, which, however, implies limited accuracy and low reproducibility. This work incorporates Augmented Reality (AR) visualization with an optical see-through head-mounted display (OST-HMD) and Human-Robot Collaboration (HRC) framework to empower the surgeon's task completion performance. In detail, Eye-to-Hand calibration, system registration, and hologram model registration are performed to realize visual guidance. A Cartesian impedance controller is used to guide the operator during the needle puncture task execution. Experiments are conducted to verify the system performance compared with conventional manual puncture procedures and a 2D monitor-based visualisation interface. The results showed that the proposed framework achieves the lowest median and standard deviation error across all the experimental groups, respectively. Furthermore, the NASA-TLX user evaluation results indicate that the proposed framework requires the lowest workload score for task completion compared to other experimental setups. The proposed framework exhibits significant potential for clinical application in the PCNL task, as it enhances the surgeon's perception capability, facilitates collision-free needle insertion path planning, and minimises errors in task completion.
Abstract:Robot-assisted surgery is rapidly developing in the medical field, and the integration of augmented reality shows the potential of improving the surgeons' operation performance by providing more visual information. In this paper, we proposed a markerless augmented reality framework to enhance safety by avoiding intra-operative bleeding which is a high risk caused by the collision between the surgical instruments and the blood vessel. Advanced stereo reconstruction and segmentation networks are compared to find out the best combination to reconstruct the intra-operative blood vessel in the 3D space for the registration of the pre-operative model, and the minimum distance detection between the instruments and the blood vessel is implemented. A robot-assisted lymphadenectomy is simulated on the da Vinci Research Kit in a dry lab, and ten human subjects performed this operation to explore the usability of the proposed framework. The result shows that the augmented reality framework can help the users to avoid the dangerous collision between the instruments and the blood vessel while not introducing an extra load. It provides a flexible framework that integrates augmented reality into the medical robot platform to enhance safety during the operation.
Abstract:Instance segmentation of surgical instruments is a long-standing research problem, crucial for the development of many applications for computer-assisted surgery. This problem is commonly tackled via fully-supervised training of deep learning models, requiring expensive pixel-level annotations to train. In this work, we develop a framework for instance segmentation not relying on spatial annotations for training. Instead, our solution only requires binary tool masks, obtainable using recent unsupervised approaches, and binary tool presence labels, freely obtainable in robot-assisted surgery. Based on the binary mask information, our solution learns to extract individual tool instances from single frames, and to encode each instance into a compact vector representation, capturing its semantic features. Such representations guide the automatic selection of a tiny number of instances (8 only in our experiments), displayed to a human operator for tool-type labelling. The gathered information is finally used to match each training instance with a binary tool presence label, providing an effective supervision signal to train a tool instance classifier. We validate our framework on the EndoVis 2017 and 2018 segmentation datasets. We provide results using binary masks obtained either by manual annotation or as predictions of an unsupervised binary segmentation model. The latter solution yields an instance segmentation approach completely free from spatial annotations, outperforming several state-of-the-art fully-supervised segmentation approaches.
Abstract:Handing objects to humans is an essential capability for collaborative robots. Previous research works on human-robot handovers focus on facilitating the performance of the human partner and possibly minimising the physical effort needed to grasp the object. However, altruistic robot behaviours may result in protracted and awkward robot motions, contributing to unpleasant sensations by the human partner and affecting perceived safety and social acceptance. This paper investigates whether transferring the cognitive science principle that "humans act coefficiently as a group" (i.e. simultaneously maximising the benefits of all agents involved) to human-robot cooperative tasks promotes a more seamless and natural interaction. Human-robot coefficiency is first modelled by identifying implicit indicators of human comfort and discomfort as well as calculating the robot energy consumption in performing the desired trajectory. We then present a reinforcement learning approach that uses the human-robot coefficiency score as reward to adapt and learn online the combination of robot interaction parameters that maximises such coefficiency. Results proved that by acting coefficiently the robot could meet the individual preferences of most subjects involved in the experiments, improve the human perceived comfort, and foster trust in the robotic partner.
Abstract:Increased demand for less invasive procedures has accelerated the adoption of Intraluminal Procedures (IP) and Endovascular Interventions (EI) performed through body lumens and vessels. As navigation through lumens and vessels is quite complex, interest grows to establish autonomous navigation techniques for IP and EI for reaching the target area. Current research efforts are directed toward increasing the Level of Autonomy (LoA) during the navigation phase. One key ingredient for autonomous navigation is Motion Planning (MP) techniques. This paper provides an overview of MP techniques categorizing them based on LoA. Our analysis investigates advances for the different clinical scenarios. Through a systematic literature analysis using the PRISMA method, the study summarizes relevant works and investigates the clinical aim, LoA, adopted MP techniques, and validation types. We identify the limitations of the corresponding MP methods and provide directions to improve the robustness of the algorithms in dynamic intraluminal environments. MP for IP and EI can be classified into four subgroups: node, sampling, optimization, and learning-based techniques, with a notable rise in learning-based approaches in recent years. One of the review's contributions is the identification of the limiting factors in IP and EI robotic systems hindering higher levels of autonomous navigation. In the future, navigation is bound to become more autonomous, placing the clinician in a supervisory position to improve control precision and reduce workload.