Abstract:The Transformer architecture has opened a new paradigm in the domain of deep learning with its ability to model long-range dependencies and capture global context and has outpaced the traditional Convolution Neural Networks (CNNs) in many aspects. However, applying Transformer models to 3D medical image datasets presents significant challenges due to their high training time, and memory requirements, which not only hinder scalability but also contribute to elevated CO$_2$ footprint. This has led to an exploration of alternative models that can maintain or even improve performance while being more efficient and environmentally sustainable. Recent advancements in Structured State Space Models (SSMs) effectively address some of the inherent limitations of Transformers, particularly their high memory and computational demands. Inspired by these advancements, we propose an efficient 3D segmentation model for medical imaging called SegResMamba, designed to reduce computation complexity, memory usage, training time, and environmental impact while maintaining high performance. Our model uses less than half the memory during training compared to other state-of-the-art (SOTA) architectures, achieving comparable performance with significantly reduced resource demands.
Abstract:An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked image modeling technique that leverages frame interpolation based reconstruction to learn fine inter-frame temporal correspondences. The features encoded in the resulting model are fine-tuned downstream. Our approach achieves state-of-the-art performance and in particular robustness compared to ultra optimized reference solutions (that use multi-stage feature fusion, multi-task and flow regularization). The experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions (23.20% when flow regularization is used); achieving a success score of 97.95% at a 3x faster inference speed of 42 frames-per-second (on GPU). The results encourage the use of our approach in various other tasks within interventional image analytics that require effective understanding of spatio-temporal semantics.
Abstract:The vast majority of the outlet glaciers and ice streams of the polar ice sheets end in the ocean. Ice mass loss via calving of the glaciers into the ocean has increased over the last few decades. Information on the temporal variability of the calving front position provides fundamental information on the state of the glacier and ice stream, which can be exploited as calibration and validation data to enhance ice dynamics modeling. To identify the calving front position automatically, deep neural network-based semantic segmentation pipelines can be used to delineate the acquired SAR imagery. However, the extreme class imbalance is highly challenging for the accurate calving front segmentation in these images. Therefore, we propose the use of the Mathews correlation coefficient (MCC) as an early stopping criterion because of its symmetrical properties and its invariance towards class imbalance. Moreover, we propose an improvement to the distance map-based binary cross-entropy (BCE) loss function. The distance map adds context to the loss function about the important regions for segmentation and helps accounting for the imbalanced data. Using Mathews correlation coefficient as early stopping demonstrates an average 15% dice coefficient improvement compared to the commonly used BCE. The modified distance map loss further improves the segmentation performance by another 2%. These results are encouraging as they support the effectiveness of the proposed methods for segmentation problems suffering from extreme class imbalances.