Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mahdi Azizian

SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks

Apr 21, 2025

Masoud Moghani, Nigel Nelson, Mohamed Ghanem, Andres Diaz-Pinto, Kush Hari, Mahdi Azizian, Ken Goldberg, Sean Huver, Animesh Garg

Figure 1 for SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks

Figure 2 for SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks

Figure 3 for SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks

Figure 4 for SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks

Abstract:Behavior cloning facilitates the learning of dexterous manipulation skills, yet the complexity of surgical environments, the difficulty and expense of obtaining patient data, and robot calibration errors present unique challenges for surgical robot learning. We provide an enhanced surgical digital twin with photorealistic human anatomical organs, integrated into a comprehensive simulator designed to generate high-quality synthetic data to solve fundamental tasks in surgical autonomy. We present SuFIA-BC: visual Behavior Cloning policies for Surgical First Interactive Autonomy Assistants. We investigate visual observation spaces including multi-view cameras and 3D visual representations extracted from a single endoscopic camera view. Through systematic evaluation, we find that the diverse set of photorealistic surgical tasks introduced in this work enables a comprehensive evaluation of prospective behavior cloning models for the unique challenges posed by surgical environments. We observe that current state-of-the-art behavior cloning techniques struggle to solve the contact-rich and complex tasks evaluated in this work, regardless of their underlying perception or control architectures. These findings highlight the importance of customizing perception pipelines and control architectures, as well as curating larger-scale synthetic datasets that meet the specific demands of surgical tasks. Project website: https://orbit-surgical.github.io/sufia-bc/

Via

Access Paper or Ask Questions

SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

May 08, 2024

Masoud Moghani, Lars Doorenbos, William Chung-Ho Panitch, Sean Huver, Mahdi Azizian, Ken Goldberg, Animesh Garg

Figure 1 for SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

Figure 2 for SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

Figure 3 for SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

Figure 4 for SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

Abstract:In this work, we present SuFIA, the first framework for natural language-guided augmented dexterity for robotic surgical assistants. SuFIA incorporates the strong reasoning capabilities of large language models (LLMs) with perception modules to implement high-level planning and low-level control of a robot for surgical sub-task execution. This enables a learning-free approach to surgical augmented dexterity without any in-context examples or motion primitives. SuFIA uses a human-in-the-loop paradigm by restoring control to the surgeon in the case of insufficient information, mitigating unexpected errors for mission-critical tasks. We evaluate SuFIA on four surgical sub-tasks in a simulation environment and two sub-tasks on a physical surgical robotic platform in the lab, demonstrating its ability to perform common surgical sub-tasks through supervised autonomous operation under challenging physical and workspace conditions. Project website: orbit-surgical.github.io/sufia

Via

Access Paper or Ask Questions

Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Feb 06, 2024

Soham Sinha, Shekhar Dwivedi, Mahdi Azizian

Figure 1 for Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Figure 2 for Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Figure 3 for Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Figure 4 for Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Abstract:The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement.

Via

Access Paper or Ask Questions

Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Feb 18, 2021

Yidan Qin, Max Allan, Yisong Yue, Joel W. Burdick, Mahdi Azizian

Figure 1 for Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Figure 2 for Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Figure 3 for Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Figure 4 for Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Abstract:Surgical state estimators in robot-assisted surgery (RAS) - especially those trained via learning techniques - rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex environments. The combination of high diversity and limited data calls for new learning methods that are robust and invariant to operating conditions and surgical techniques. We propose StiseNet, a Surgical Task Invariance State Estimation Network with an invariance induction framework that minimizes the effects of variations in surgical technique and operating environments inherent to RAS datasets. StiseNet's adversarial architecture learns to separate nuisance factors from information needed for surgical state estimation. StiseNet is shown to outperform state-of-the-art state estimation methods on three datasets (including a new real-world RAS dataset: HERNIA-20).

* Accepted to IEEE Robotics & Automation Letters

Via

Access Paper or Ask Questions

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Jan 28, 2021

Max Allan, Jonathan Mcleod, Congcong Wang, Jean Claude Rosenthal, Zhenglei Hu, Niklas Gard, Peter Eisert, Ke Xue Fu, Trevor Zeffiro, Wenyao Xia(+14 more)

Figure 1 for Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Figure 2 for Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Figure 3 for Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Figure 4 for Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Abstract:The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.

Via

Access Paper or Ask Questions

daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Sep 24, 2020

Yidan Qin, Seyedshams Feyzabadi, Max Allan, Joel W. Burdick, Mahdi Azizian

Figure 1 for daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Figure 2 for daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Figure 3 for daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Figure 4 for daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Abstract:This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound scanning, often have distinguishable tool kinematics and visual features, and can be described as a series of fine-grained states with transition schematics. We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions. daVinciNet performs concurrent end-effector trajectory and surgical state predictions using features extracted from multiple data streams, including robot kinematics, endoscopic vision, and system events. We evaluate our proposed model on an extended Robotic Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi surgical system and the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error.

* Accepted to IROS 2020

Via

Access Paper or Ask Questions

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Feb 07, 2020

Yidan Qin, Sahba Aghajani Pedram, Seyedshams Feyzabadi, Max Allan, A. Jonathan McLeod, Joel W. Burdick, Mahdi Azizian

Figure 1 for Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Figure 2 for Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Figure 3 for Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Figure 4 for Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Abstract:Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset.

* Accepted to ICRA 2020

Via

Access Paper or Ask Questions

2018 Robotic Scene Segmentation Challenge

Feb 03, 2020

Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen(+31 more)

Figure 1 for 2018 Robotic Scene Segmentation Challenge

Figure 2 for 2018 Robotic Scene Segmentation Challenge

Figure 3 for 2018 Robotic Scene Segmentation Challenge

Figure 4 for 2018 Robotic Scene Segmentation Challenge

Abstract:In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs.

Via

Access Paper or Ask Questions

2017 Robotic Instrument Segmentation Challenge

Feb 21, 2019

Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt(+9 more)

Figure 1 for 2017 Robotic Instrument Segmentation Challenge

Figure 2 for 2017 Robotic Instrument Segmentation Challenge

Figure 3 for 2017 Robotic Instrument Segmentation Challenge

Figure 4 for 2017 Robotic Instrument Segmentation Challenge

Abstract:In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.

Via

Access Paper or Ask Questions

Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Mar 24, 2017

Daniil Pakhomov, Vittal Premachandran, Max Allan, Mahdi Azizian, Nassir Navab

Figure 1 for Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Figure 2 for Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Figure 3 for Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Figure 4 for Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Abstract:Detection, tracking, and pose estimation of surgical instruments are crucial tasks for computer assistance during minimally invasive robotic surgery. In the majority of cases, the first step is the automatic segmentation of surgical tools. Prior work has focused on binary segmentation, where the objective is to label every pixel in an image as tool or background. We improve upon previous work in two major ways. First, we leverage recent techniques such as deep residual learning and dilated convolutions to advance binary-segmentation performance. Second, we extend the approach to multi-class segmentation, which lets us segment different parts of the tool, in addition to background. We demonstrate the performance of this method on the MICCAI Endoscopic Vision Challenge Robotic Instruments dataset.

Via

Access Paper or Ask Questions