Abstract:Single-source domain generalization (SDG) aims to learn a model from a single source domain that can generalize well on unseen target domains. This is an important task in computer vision, particularly relevant to medical imaging where domain shifts are common. In this work, we consider a challenging yet practical setting: SDG for cross-modality medical image segmentation. We combine causality-inspired theoretical insights on learning domain-invariant representations with recent advancements in diffusion-based augmentation to improve generalization across diverse imaging modalities. Guided by the ``intervention-augmentation equivariant'' principle, we use controlled diffusion models (DMs) to simulate diverse imaging styles while preserving the content, leveraging rich generative priors in large-scale pretrained DMs to comprehensively perturb the multidimensional style variable. Extensive experiments on challenging cross-modality segmentation tasks demonstrate that our approach consistently outperforms state-of-the-art SDG methods across three distinct anatomies and imaging modalities. The source code is available at \href{https://github.com/ratschlab/ICMSeg}{https://github.com/ratschlab/ICMSeg}.
Abstract:Neural networks achieve state-of-the-art performance in many supervised learning tasks when the training data distribution matches the test data distribution. However, their performance drops significantly under domain (covariate) shift, a prevalent issue in medical image segmentation due to varying acquisition settings across different scanner models and protocols. Recently, foundational models (FMs) trained on large datasets have gained attention for their ability to be adapted for downstream tasks and achieve state-of-the-art performance with excellent generalization capabilities on natural images. However, their effectiveness in medical image segmentation remains underexplored. In this paper, we investigate the domain generalization performance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when fine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such as Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head architecture, HQHSAM, which simply integrates elements from two state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation performance. Our extensive experiments on multiple datasets, encompassing various anatomies and modalities, reveal that FMs, particularly with the HQHSAM decode head, improve domain generalization for medical image segmentation. Moreover, we found that the effectiveness of PEFT techniques varies across different FMs. These findings underscore the potential of FMs to enhance the domain generalization performance of neural networks in medical image segmentation across diverse clinical settings, providing a solid foundation for future research. Code and models are available for research purposes at \url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.
Abstract:Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.