Member, IEEE
Abstract:Purpose: Surgical phase recognition (SPR) is an integral component of surgical data science, enabling high-level surgical analysis. End-to-end trained neural networks that predict surgical phase directly from videos have shown excellent performance on benchmarks. However, these models struggle with robustness due to non-causal associations in the training set, resulting in poor generalizability. Our goal is to improve model robustness to variations in the surgical videos by leveraging the digital twin (DT) paradigm -- an intermediary layer to separate high-level analysis (SPR) from low-level processing (geometric understanding). This approach takes advantage of the recent vision foundation models that ensure reliable low-level scene understanding to craft DT-based scene representations that support various high-level tasks. Methods: We present a DT-based framework for SPR from videos. The framework employs vision foundation models to extract representations. We embed the representation in place of raw video inputs in the state-of-the-art Surgformer model. The framework is trained on the Cholec80 dataset and evaluated on out-of-distribution (OOD) and corrupted test samples. Results: Contrary to the vulnerability of the baseline model, our framework demonstrates strong robustness on both OOD and corrupted samples, with a video-level accuracy of 51.1 on the challenging CRCD dataset, 96.0 on an internal robotics training dataset, and 64.4 on a highly corrupted Cholec80 test set. Conclusion: Our findings lend support to the thesis that DT-based scene representations are effective in enhancing model robustness. Future work will seek to improve the feature informativeness, automate feature extraction, and incorporate interpretability for a more comprehensive framework.
Abstract:Vascular anastomosis, the surgical connection of blood vessels, is essential in procedures such as organ transplants and reconstructive surgeries. The precision required limits accessibility due to the extensive training needed, with manual suturing leading to variable outcomes and revision rates up to 7.9%. Existing robotic systems, while promising, are either fully teleoperated or lack the capabilities necessary for autonomous vascular anastomosis. We present the Micro Smart Tissue Autonomous Robot (micro-STAR), an autonomous robotic system designed to perform vascular anastomosis on small-diameter vessels. The micro-STAR system integrates a novel suturing tool equipped with Optical Coherence Tomography (OCT) fiber-optic sensor and a microcamera, enabling real-time tissue detection and classification. Our system autonomously places sutures and manipulates tissue with minimal human intervention. In an ex vivo study, micro-STAR achieved outcomes competitive with experienced surgeons in terms of leak pressure, lumen reduction, and suture placement variation, completing 90% of sutures without human intervention. This represents the first instance of a robotic system autonomously performing vascular anastomosis on real tissue, offering significant potential for improving surgical precision and expanding access to high-quality care.
Abstract:We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to task failure. To overcome this limitation, we introduce a relative action formulation which enables successful policy training and deployment using its approximate kinematics data. A promising outcome of this approach is that the large repository of clinical data, which contains approximate kinematics, may be directly utilized for robot learning without further corrections. We demonstrate our findings through successful execution of three fundamental surgical tasks, including tissue manipulation, needle handling, and knot-tying.
Abstract:The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general surgery vision transformer (GSViT) on surgical videos based on forward video prediction that can run in real-time for surgical applications, toward which we open-source the code and weights of GSViT; (iii) we also release code and weights for procedure-specific fine-tuned versions of GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the Cholec80 phase annotation task, displaying improved performance over state-of-the-art single frame predictors.
Abstract:Prostate cancer diagnosis continues to encounter challenges, often due to imprecise needle placement in standard biopsies. Several control strategies have been developed to compensate for needle tip prediction inaccuracies, however none were compared against each other, and it is unclear whether any of them can be safely and universally applied in clinical settings. This paper compares the performance of two resolved-rate controllers, derived from a mechanics-based and a data-driven approach, for bevel-tip needle control using needle shape manipulation through a template. We demonstrate for a simulated 12-core biopsy procedure under model parameter uncertainty that the mechanics-based controller can better reach desired targets when only the final goal configuration is presented even with uncertainty on model parameters estimation, and that providing a feasible needle path is crucial in ensuring safe surgical outcomes when either controller is used for needle shape manipulation.
Abstract:The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.
Abstract:Percutaneous needle insertions are commonly performed for diagnostic and therapeutic purposes as an effective alternative to more invasive surgical procedures. However, the outcome of needle-based approaches relies heavily on the accuracy of needle placement, which remains a challenge even with robot assistance and medical imaging guidance due to needle deflection caused by contact with soft tissues. In this paper, we present a novel mechanics-based 2D bevel-tip needle model that can account for the effect of nonlinear strain-dependent behavior of biological soft tissues under compression. Real-time finite element simulation allows multiple control inputs along the length of the needle with full three-degree-of-freedom (DOF) planar needle motions. Cross-validation studies using custom-designed multi-layer tissue phantoms as well as heterogeneous chicken breast tissues result in less than 1mm in-plane errors for insertions reaching depths of up to 61 mm, demonstrating the validity and generalizability of the proposed method.
Abstract:Recent advances in robot-assisted surgery have resulted in progressively more precise, efficient, and minimally invasive procedures, sparking a new era of robotic surgical intervention. This enables doctors, in collaborative interaction with robots, to perform traditional or minimally invasive surgeries with improved outcomes through smaller incisions. Recent efforts are working toward making robotic surgery more autonomous which has the potential to reduce variability of surgical outcomes and reduce complication rates. Deep reinforcement learning methodologies offer scalable solutions for surgical automation, but their effectiveness relies on extensive data acquisition due to the absence of prior knowledge in successfully accomplishing tasks. Due to the intensive nature of simulated data collection, previous works have focused on making existing algorithms more efficient. In this work, we focus on making the simulator more efficient, making training data much more accessible than previously possible. We introduce Surgical Gym, an open-source high performance platform for surgical robot learning where both the physics simulation and reinforcement learning occur directly on the GPU. We demonstrate between 100-5000x faster training times compared with previous surgical learning platforms. The code is available at: https://github.com/SamuelSchmidgall/SurgicalGym.
Abstract:Magnetic robotics obviate the physical connections between the actuators and end effectors resulting in ultra-minimally invasive surgeries. Even though such a wireless actuation method is highly advantageous in medical applications, the trade-off between the applied force and miniature magnetic end effector dimensions has been one of the main challenges in practical applications in clinically relevant conditions. This trade-off is crucial for applications where in-tissue penetration is required (e.g., needle access, biopsy, and suturing). To increase the forces of such magnetic miniature end effectors to practically useful levels, we propose an impact-force-based suturing needle that is capable of penetrating into in-vitro and ex-vivo samples with 3-DoF planar freedom (planar positioning and in-plane orienting). The proposed optimized design is a custom-built 12 G needle that can generate 1.16 N penetration force which is 56 times stronger than its magnetic counterparts with the same size without such an impact force. By containing the fast-moving permanent magnet within the needle in a confined tubular structure, the movement of the overall needle remains slow and easily controllable. The achieved force is in the range of tissue penetration limits allowing the needle to be able to penetrate through tissues to follow a suturing method in a teleoperated fashion. We demonstrated in-vitro needle penetration into a bacon strip and successful suturing of a gauze mesh onto an agar gel mimicking a hernia repair procedure.
Abstract:Real-time visual localization of needles is necessary for various surgical applications, including surgical automation and visual feedback. In this study we investigate localization and autonomous robotic control of needles in the context of our magneto-suturing system. Our system holds the potential for surgical manipulation with the benefit of minimal invasiveness and reduced patient side effects. However, the non-linear magnetic fields produce unintuitive forces and demand delicate position-based control that exceeds the capabilities of direct human manipulation. This makes automatic needle localization a necessity. Our localization method combines neural network-based segmentation and classical techniques, and we are able to consistently locate our needle with 0.73 mm RMS error in clean environments and 2.72 mm RMS error in challenging environments with blood and occlusion. The average localization RMS error is 2.16 mm for all environments we used in the experiments. We combine this localization method with our closed-loop feedback control system to demonstrate the further applicability of localization to autonomous control. Our needle is able to follow a running suture path in (1) no blood, no tissue; (2) heavy blood, no tissue; (3) no blood, with tissue; and (4) heavy blood, with tissue environments. The tip position tracking error ranges from 2.6 mm to 3.7 mm RMS, opening the door towards autonomous suturing tasks.