Abstract:We propose the first joint-task learning framework for brain and vessel segmentation (JoB-VS) from Time-of-Flight Magnetic Resonance images. Unlike state-of-the-art vessel segmentation methods, our approach avoids the pre-processing step of implementing a model to extract the brain from the volumetric input data. Skipping this additional step makes our method an end-to-end vessel segmentation framework. JoB-VS uses a lattice architecture that favors the segmentation of structures of different scales (e.g., the brain and vessels). Its segmentation head allows the simultaneous prediction of the brain and vessel mask. Moreover, we generate data augmentation with adversarial examples, which our results demonstrate to enhance the performance. JoB-VS achieves 70.03% mean AP and 69.09% F1-score in the OASIS-3 dataset and is capable of generalizing the segmentation in the IXI dataset. These results show the adequacy of JoB-VS for the challenging task of vessel segmentation in complete TOF-MRA images.
Abstract:Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.