CIBM Center for Biomedical Imaging, Switzerland, Department of Radiology, Lausanne University Hospital
Abstract:Diffusion Magnetic Resonance Imaging (dMRI) is a non-invasive method for depicting brain microstructure in vivo. Fiber orientation distributions (FODs) are mathematical representations extensively used to map white matter fiber configurations. Recently, FOD estimation with deep neural networks has seen growing success, in particular, those of neonates estimated with fewer diffusion measurements. These methods are mostly trained on target FODs reconstructed with multi-shell multi-tissue constrained spherical deconvolution (MSMT-CSD), which might not be the ideal ground truth for developing brains. Here, we investigate this hypothesis by training a state-of-the-art model based on the U-Net architecture on both MSMT-CSD and single-shell three-tissue constrained spherical deconvolution (SS3T-CSD). Our results suggest that SS3T-CSD might be more suited for neonatal brains, given that the ratio between single and multiple fiber-estimated voxels with SS3T-CSD is more realistic compared to MSMT-CSD. Additionally, increasing the number of input gradient directions significantly improves performance with SS3T-CSD over MSMT-CSD. Finally, in an age domain-shift setting, SS3T-CSD maintains robust performance across age groups, indicating its potential for more accurate neonatal brain imaging.
Abstract:Uncertainty quantification (UQ) has become critical for evaluating the reliability of artificial intelligence systems, especially in medical image segmentation. This study addresses the interpretability of instance-wise uncertainty values in deep learning models for focal lesion segmentation in magnetic resonance imaging, specifically cortical lesion (CL) segmentation in multiple sclerosis. CL segmentation presents several challenges, including the complexity of manual segmentation, high variability in annotation, data scarcity, and class imbalance, all of which contribute to aleatoric and epistemic uncertainty. We explore how UQ can be used not only to assess prediction reliability but also to provide insights into model behavior, detect biases, and verify the accuracy of UQ methods. Our research demonstrates the potential of instance-wise uncertainty values to offer post hoc global model explanations, serving as a sanity check for the model. The implementation is available at https://github.com/NataliiaMolch/interpret-lesion-unc.
Abstract:In recent years, explainable methods for artificial intelligence (XAI) have tried to reveal and describe models' decision mechanisms in the case of classification tasks. However, XAI for semantic segmentation and in particular for single instances has been little studied to date. Understanding the process underlying automatic segmentation of single instances is crucial to reveal what information was used to detect and segment a given object of interest. In this study, we proposed two instance-level explanation maps for semantic segmentation based on SmoothGrad and Grad-CAM++ methods. Then, we investigated their relevance for the detection and segmentation of white matter lesions (WML), a magnetic resonance imaging (MRI) biomarker in multiple sclerosis (MS). 687 patients diagnosed with MS for a total of 4043 FLAIR and MPRAGE MRI scans were collected at the University Hospital of Basel, Switzerland. Data were randomly split into training, validation and test sets to train a 3D U-Net for MS lesion segmentation. We observed 3050 true positive (TP), 1818 false positive (FP), and 789 false negative (FN) cases. We generated instance-level explanation maps for semantic segmentation, by developing two XAI methods based on SmoothGrad and Grad-CAM++. We investigated: 1) the distribution of gradients in saliency maps with respect to both input MRI sequences; 2) the model's response in the case of synthetic lesions; 3) the amount of perilesional tissue needed by the model to segment a lesion. Saliency maps (based on SmoothGrad) in FLAIR showed positive values inside a lesion and negative in its neighborhood. Peak values of saliency maps generated for these four groups of volumes presented distributions that differ significantly from one another, suggesting a quantitative nature of the proposed saliency. Contextual information of 7mm around the lesion border was required for their segmentation.
Abstract:Segmentation of fetal brain tissue from magnetic resonance imaging (MRI) plays a crucial role in the study of in utero neurodevelopment. However, automated tools face substantial domain shift challenges as they must be robust to highly heterogeneous clinical data, often limited in numbers and lacking annotations. Indeed, high variability of the fetal brain morphology, MRI acquisition parameters, and superresolution reconstruction (SR) algorithms adversely affect the model's performance when evaluated out-of-domain. In this work, we introduce FetalSynthSeg, a domain randomization method to segment fetal brain MRI, inspired by SynthSeg. Our results show that models trained solely on synthetic data outperform models trained on real data in out-ofdomain settings, validated on a 120-subject cross-domain dataset. Furthermore, we extend our evaluation to 40 subjects acquired using lowfield (0.55T) MRI and reconstructed with novel SR models, showcasing robustness across different magnetic field strengths and SR algorithms. Leveraging a generative synthetic approach, we tackle the domain shift problem in fetal brain MRI and offer compelling prospects for applications in fields with limited and highly heterogeneous data.
Abstract:Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
Abstract:Deep learning models have shown great promise in estimating tissue microstructure from limited diffusion magnetic resonance imaging data. However, these models face domain shift challenges when test and train data are from different scanners and protocols, or when the models are applied to data with inherent variations such as the developing brains of infants and children scanned at various ages. Several techniques have been proposed to address some of these challenges, such as data harmonization or domain adaptation in the adult brain. However, those techniques remain unexplored for the estimation of fiber orientation distribution functions in the rapidly developing brains of infants. In this work, we extensively investigate the age effect and domain shift within and across two different cohorts of 201 newborns and 165 babies using the Method of Moments and fine-tuning strategies. Our results show that reduced variations in the microstructural development of babies in comparison to newborns directly impact the deep learning models' cross-age performance. We also demonstrate that a small number of target domain samples can significantly mitigate domain shift problems.
Abstract:This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. Firstly, we postulate that a good uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 172 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
Abstract:Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.
Abstract:The brain white matter consists of a set of tracts that connect distinct regions of the brain. Segmentation of these tracts is often needed for clinical and research studies. Diffusion-weighted MRI offers unique contrast to delineate these tracts. However, existing segmentation methods rely on intermediate computations such as tractography or estimation of fiber orientation density. These intermediate computations, in turn, entail complex computations that can result in unnecessary errors. Moreover, these intermediate computations often require dense multi-shell measurements that are unavailable in many clinical and research applications. As a result, current methods suffer from low accuracy and poor generalizability. Here, we propose a new deep learning method that segments these tracts directly from the diffusion MRI data, thereby sidestepping the intermediate computation errors. Our experiments show that this method can achieve segmentation accuracy that is on par with the state of the art methods (mean Dice Similarity Coefficient of 0.826). Compared with the state of the art, our method offers far superior generalizability to undersampled data that are typical of clinical studies and to data obtained with different acquisition protocols. Moreover, we propose a new method for detecting inaccurate segmentations and show that it is more accurate than standard methods that are based on estimation uncertainty quantification. The new methods can serve many critically important clinical and scientific applications that require accurate and reliable non-invasive segmentation of white matter tracts.
Abstract:Accurate segmentation of thalamic nuclei, crucial for understanding their role in healthy cognition and in pathologies, is challenging to achieve on standard T1-weighted (T1w) magnetic resonance imaging (MRI) due to poor image contrast. White-matter-nulled (WMn) MRI sequences improve intrathalamic contrast but are not part of clinical protocols or extant databases. Here, we introduce Histogram-based polynomial synthesis (HIPS), a fast preprocessing step that synthesizes WMn-like image contrast from standard T1w MRI using a polynomial approximation. HIPS was incorporated into our Thalamus Optimized Multi-Atlas Segmentation (THOMAS) pipeline, developed and optimized for WMn MRI. HIPS-THOMAS was compared to a convolutional neural network (CNN)-based segmentation method and THOMAS modified for T1w images (T1w-THOMAS). The robustness and accuracy of the three methods were tested across different image contrasts, scanner manufacturers, and field strength. HIPS-synthesized images improved intra-thalamic contrast and thalamic boundaries, and their segmentations yielded significantly better mean Dice, lower percentage of volume error, and lower standard deviations compared to both the CNN method and T1w-THOMAS. Finally, using THOMAS, HIPS-synthesized images were as effective as WMn images for identifying thalamic nuclei atrophy in alcohol use disorders subjects relative to healthy controls, with a higher area under the ROC curve compared to T1w-THOMAS (0.79 vs 0.73).