Abstract:This chapter explores anomaly localization in medical images using denoising diffusion models. After providing a brief methodological background of these models, including their application to image reconstruction and their conditioning using guidance mechanisms, we provide an overview of available datasets and evaluation metrics suitable for their application to anomaly localization in medical images. In this context, we discuss supervision schemes ranging from fully supervised segmentation to semi-supervised, weakly supervised, self-supervised, and unsupervised methods, and provide insights into the effectiveness and limitations of these approaches. Furthermore, we highlight open challenges in anomaly localization, including detection bias, domain shift, computational cost, and model interpretability. Our goal is to provide an overview of the current state of the art in the field, outline research gaps, and highlight the potential of diffusion models for robust anomaly localization in medical images.
Abstract:Deep learning models in medical imaging often encounter challenges when adapting to new clinical settings unseen during training. Test-time adaptation offers a promising approach to optimize models for these unseen domains, yet its application in anomaly detection (AD) remains largely unexplored. AD aims to efficiently identify deviations from normative distributions; however, full adaptation, including pathological shifts, may inadvertently learn the anomalies it intends to detect. We introduce a novel concept of \emph{selective} test-time adaptation that utilizes the inherent characteristics of deep pre-trained features to adapt \emph{selectively} in a zero-shot manner to any test image from an unseen domain. This approach employs a model-agnostic, lightweight multi-layer perceptron for neural implicit representations, enabling the adaptation of outputs from any reconstruction-based AD method without altering the source-trained model. Rigorous validation in brain AD demonstrated that our strategy substantially enhances detection accuracy for multiple conditions and different target distributions. Specifically, our method improves the detection rates by up to 78\% for enlarged ventricles and 24\% for edemas.
Abstract:This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localization and characterization without the associated health risks. Furthermore, we qualitatively and quantitatively evaluate the synthetic DCE-MRI images, proposing a multi-metric Scaled Aggregate Measure (SAMe), assessing their utility in a tumor segmentation downstream task, and conclude with an analysis of the temporal patterns in multi-sequence DCE-MRI generation. Our approach demonstrates promising results in generating realistic and useful DCE-MRI sequences, highlighting the potential of virtual contrast enhancement for improving breast cancer diagnosis and treatment, particularly for patients where contrast agent administration is contraindicated.
Abstract:Background: The integration of multi-stain histopathology images through deep learning poses a significant challenge in digital histopathology. Current multi-modal approaches struggle with data heterogeneity and missing data. This study aims to overcome these limitations by developing a novel transformer model for multi-stain integration that can handle missing data during training as well as inference. Methods: We propose UNICORN (UNiversal modality Integration Network for CORonary classificatioN) a multi-modal transformer capable of processing multi-stain histopathology for atherosclerosis severity class prediction. The architecture comprises a two-stage, end-to-end trainable model with specialized modules utilizing transformer self-attention blocks. The initial stage employs domain-specific expert modules to extract features from each modality. In the subsequent stage, an aggregation expert module integrates these features by learning the interactions between the different data modalities. Results: Evaluation was performed using a multi-class dataset of atherosclerotic lesions from the Munich Cardiovascular Studies Biobank (MISSION), using over 4,000 paired multi-stain whole slide images (WSIs) from 170 deceased individuals on 7 prespecified segments of the coronary tree, each stained according to four histopathological protocols. UNICORN achieved a classification accuracy of 0.67, outperforming other state-of-the-art models. The model effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression. Conclusion: Our proposed multi-modal transformer model addresses key challenges in medical data analysis, including data heterogeneity and missing modalities. Explainability and the model's effectiveness in predicting atherosclerosis progression underscores its potential for broader applications in medical research.
Abstract:We propose in this paper a texture-invariant 2D keypoints descriptor specifically designed for matching preoperative Magnetic Resonance (MR) images with intraoperative Ultrasound (US) images. We introduce a matching-by-synthesis strategy, where intraoperative US images are synthesized from MR images accounting for multiple MR modalities and intraoperative US variability. We build our training set by enforcing keypoints localization over all images then train a patient-specific descriptor network that learns texture-invariant discriminant features in a supervised contrastive manner, leading to robust keypoints descriptors. Our experiments on real cases with ground truth show the effectiveness of the proposed approach, outperforming the state-of-the-art methods and achieving 80.35% matching precision on average.
Abstract:Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of $61.0\%$. On the other hand, top-ranked teams, with a median Dice score exceeding $70\%$, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.
Abstract:Recent studies have underscored the capabilities of natural imaging foundation models to serve as powerful feature extractors, even in a zero-shot setting for medical imaging data. Most commonly, a shallow multi-layer perceptron (MLP) is appended to the feature extractor to facilitate end-to-end learning and downstream prediction tasks such as classification, thus representing the de facto standard. However, as graph neural networks (GNNs) have become a practicable choice for various tasks in medical research in the recent past, we direct attention to the question of how effective GNNs are compared to MLP prediction heads for the task of 3D medical image classification, proposing them as a potential alternative. In our experiments, we devise a subject-level graph for each volumetric dataset instance. Therein latent representations of all slices in the volume, encoded through a DINOv2 pretrained vision transformer (ViT), constitute the nodes and their respective node features. We use public datasets to compare the classification heads numerically and evaluate various graph construction and graph convolution methods in our experiments. Our findings show enhancements of the GNN in classification performance and substantial improvements in runtime compared to an MLP prediction head. Additional robustness evaluations further validate the promising performance of the GNN, promoting them as a suitable alternative to traditional MLP classification heads. Our code is publicly available at: https://github.com/compai-lab/2024-miccai-grail-kiechle
Abstract:Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fr\'echet Inception Distance (FID) of 26.77 at $\epsilon=10$, compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism.
Abstract:Denoising diffusion probabilistic models enable high-fidelity image synthesis and editing. In biomedicine, these models facilitate counterfactual image editing, producing pairs of images where one is edited to simulate hypothetical conditions. For example, they can model the progression of specific diseases, such as stroke lesions. However, current image editing techniques often fail to generate realistic biomedical counterfactuals, either by inadequately modeling indirect pathological effects like brain atrophy or by excessively altering the scan, which disrupts correspondence to the original images. Here, we propose MedEdit, a conditional diffusion model for medical image editing. MedEdit induces pathology in specific areas while balancing the modeling of disease effects and preserving the integrity of the original scan. We evaluated MedEdit on the Atlas v2.0 stroke dataset using Frechet Inception Distance and Dice scores, outperforming state-of-the-art diffusion-based methods such as Palette (by 45%) and SDEdit (by 61%). Additionally, clinical evaluations by a board-certified neuroradiologist confirmed that MedEdit generated realistic stroke scans indistinguishable from real ones. We believe this work will enable counterfactual image editing research to further advance the development of realistic and clinically useful imaging tools.
Abstract:Ultrasonography is an essential tool in mid-pregnancy for assessing fetal development, appreciated for its non-invasive and real-time imaging capabilities. Yet, the interpretation of ultrasound images is often complicated by acoustic shadows, speckle noise, and other artifacts that obscure crucial diagnostic details. To address these challenges, our study presents a novel unsupervised anomaly detection framework specifically designed for fetal ultrasound imaging. This framework incorporates gestational age filtering, precise identification of fetal standard planes, and targeted segmentation of brain regions to enhance diagnostic accuracy. Furthermore, we introduce the use of denoising diffusion probabilistic models in this context, marking a significant innovation in detecting previously unrecognized anomalies. We rigorously evaluated the framework using various diffusion-based anomaly detection methods, noise types, and noise levels. Notably, AutoDDPM emerged as the most effective, achieving an area under the precision-recall curve of 79.8\% in detecting anomalies. This advancement holds promise for improving the tools available for nuanced and effective prenatal diagnostics.