Abstract:Image decomposition aims to analyze an image into elementary components, which is essential for numerous downstream tasks and also by nature provides certain interpretability to the analysis. Deep learning can be powerful for such tasks, but surprisingly their combination with a focus on interpretability and generalizability is rarely explored. In this work, we introduce a novel framework for interpretable deep image decomposition, combining hierarchical Bayesian modeling and deep learning to create an architecture-modularized and model-generalizable deep neural network (DNN). The proposed framework includes three steps: (1) hierarchical Bayesian modeling of image decomposition, (2) transforming the inference problem into optimization tasks, and (3) deep inference via a modularized Bayesian DNN. We further establish a theoretical connection between the loss function and the generalization error bound, which inspires a new test-time adaptation approach for out-of-distribution scenarios. We instantiated the application using two downstream tasks, \textit{i.e.}, image denoising, and unsupervised anomaly detection, and the results demonstrated improved generalizability as well as interpretability of our methods. The source code will be released upon the acceptance of this paper.
Abstract:Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking interpretability on predictions. To address the above challenges, this paper introduces TrustI2I, a novel trustworthy method that reformulates multi-to-one medical image translation problem as a multimodal regression problem, aiming to build an uncertainty-aware and reliable system. Specifically, our method leverages deep evidential regression to estimate prediction uncertainties and employs an explicit intermediate and late fusion strategy based on the Mixture of Normal Inverse Gamma (MoNIG) distribution, enhancing both synthesis quality and interpretability. Additionally, we incorporate uncertainty calibration to improve the reliability of uncertainty. Validation on the BraTS2018 dataset demonstrates that our approach surpasses current methods, producing higher-quality images with rational uncertainty estimation.
Abstract:Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
Abstract:Due to the cross-domain distribution shift aroused from diverse medical imaging systems, many deep learning segmentation methods fail to perform well on unseen data, which limits their real-world applicability. Recent works have shown the benefits of extracting domain-invariant representations on domain generalization. However, the interpretability of domain-invariant features remains a great challenge. To address this problem, we propose an interpretable Bayesian framework (BayeSeg) through Bayesian modeling of image and label statistics to enhance model generalizability for medical image segmentation. Specifically, we first decompose an image into a spatial-correlated variable and a spatial-variant variable, assigning hierarchical Bayesian priors to explicitly force them to model the domain-stable shape and domain-specific appearance information respectively. Then, we model the segmentation as a locally smooth variable only related to the shape. Finally, we develop a variational Bayesian framework to infer the posterior distributions of these explainable variables. The framework is implemented with neural networks, and thus is referred to as deep Bayesian segmentation. Quantitative and qualitative experimental results on prostate segmentation and cardiac segmentation tasks have shown the effectiveness of our proposed method. Moreover, we investigated the interpretability of BayeSeg by explaining the posteriors and analyzed certain factors that affect the generalization ability through further ablation studies. Our code will be released via https://zmiclab.github.io/projects.html, once the manuscript is accepted for publication.
Abstract:Medical images are generally acquired with limited field-of-view (FOV), which could lead to incomplete regions of interest (ROI), and thus impose a great challenge on medical image analysis. This is particularly evident for the learning-based multi-target landmark detection, where algorithms could be misleading to learn primarily the variation of background due to the varying FOV, failing the detection of targets. Based on learning a navigation policy, instead of predicting targets directly, reinforcement learning (RL)-based methods have the potential totackle this challenge in an efficient manner. Inspired by this, in this work we propose a multi-agent RL framework for simultaneous multi-target landmark detection. This framework is aimed to learn from incomplete or (and) complete images to form an implicit knowledge of global structure, which is consolidated during the training stage for the detection of targets from either complete or incomplete test images. To further explicitly exploit the global structural information from incomplete images, we propose to embed a shape model into the RL process. With this prior knowledge, the proposed RL model can not only localize dozens of targetssimultaneously, but also work effectively and robustly in the presence of incomplete images. We validated the applicability and efficacy of the proposed method on various multi-target detection tasks with incomplete images from practical clinics, using body dual-energy X-ray absorptiometry (DXA), cardiac MRI and head CT datasets. Results showed that our method could predict whole set of landmarks with incomplete training images up to 80% missing proportion (average distance error 2.29 cm on body DXA), and could detect unseen landmarks in regions with missing image information outside FOV of target images (average distance error 6.84 mm on 3D half-head CT).
Abstract:Although supervised deep-learning has achieved promising performance in medical image segmentation, many methods cannot generalize well on unseen data, limiting their real-world applicability. To address this problem, we propose a deep learning-based Bayesian framework, which jointly models image and label statistics, utilizing the domain-irrelevant contour of a medical image for segmentation. Specifically, we first decompose an image into components of contour and basis. Then, we model the expected label as a variable only related to the contour. Finally, we develop a variational Bayesian framework to infer the posterior distributions of these variables, including the contour, the basis, and the label. The framework is implemented with neural networks, thus is referred to as deep Bayesian segmentation. Results on the task of cross-sequence cardiac MRI segmentation show that our method set a new state of the art for model generalizability. Particularly, the BayeSeg model trained with LGE MRI generalized well on T2 images and outperformed other models with great margins, i.e., over 0.47 in terms of average Dice. Our code is available at https://zmiclab.github.io/projects.html.
Abstract:Modeling statistics of image priors is useful for image super-resolution, but little attention has been paid from the massive works of deep learning-based methods. In this work, we propose a Bayesian image restoration framework, where natural image statistics are modeled with the combination of smoothness and sparsity priors. Concretely, firstly we consider an ideal image as the sum of a smoothness component and a sparsity residual, and model real image degradation including blurring, downscaling, and noise corruption. Then, we develop a variational Bayesian approach to infer their posteriors. Finally, we implement the variational approach for single image super-resolution (SISR) using deep neural networks, and propose an unsupervised training strategy. The experiments on three image restoration tasks, \textit{i.e.,} ideal SISR, realistic SISR, and real-world SISR, demonstrate that our method has superior model generalizability against varying noise levels and degradation kernels and is effective in unsupervised SISR. The code and resulting models are released via \url{https://zmiclab.github.io/projects.html}.
Abstract:Registration networks have shown great application potentials in medical image analysis. However, supervised training methods have a great demand for large and high-quality labeled datasets, which is time-consuming and sometimes impractical due to data sharing issues. Unsupervised image registration algorithms commonly employ intensity-based similarity measures as loss functions without any manual annotations. These methods estimate the parameterized transformations between pairs of moving and fixed images through the optimization of the network parameters during training. However, these methods become less effective when the image quality varies, e.g., some images are corrupted by substantial noise or artifacts. In this work, we propose a novel approach based on a low-rank representation, i.e., Regnet-LRR, to tackle the problem. We project noisy images into a noise-free low-rank space, and then compute the similarity between the images. Based on the low-rank similarity measure, we train the registration network to predict the dense deformation fields of noisy image pairs. We highlight that the low-rank projection is reformulated in a way that the registration network can successfully update gradients. With two tasks, i.e., cardiac and abdominal intra-modality registration, we demonstrate that the low-rank representation can boost the generalization ability and robustness of models as well as bring significant improvements in noisy data registration scenarios.
Abstract:Super-resolution (SR) is an ill-posed problem, which means that infinitely many high-resolution (HR) images can be degraded to the same low-resolution (LR) image. To study the one-to-many stochastic SR mapping, we implicitly represent the non-local self-similarity of natural images and develop a Variational Sparse framework for Super-Resolution (VSpSR) via neural networks. Since every small patch of a HR image can be well approximated by the sparse representation of atoms in an over-complete dictionary, we design a two-branch module, i.e., VSpM, to explore the SR space. Concretely, one branch of VSpM extracts patch-level basis from the LR input, and the other branch infers pixel-wise variational distributions with respect to the sparse coefficients. By repeatedly sampling coefficients, we could obtain infinite sparse representations, and thus generate diverse HR images. According to the preliminary results of NTIRE 2021 challenge on learning SR space, our team (FudanZmic21) ranks 7-th in terms of released scores. The implementation of VSpSR is released at https://zmiclab.github.io/.
Abstract:The principal rank-one (RO) components of an image represent the self-similarity of the image, which is an important property for image restoration. However, the RO components of a corrupted image could be decimated by the procedure of image denoising. We suggest that the RO property should be utilized and the decimation should be avoided in image restoration. To achieve this, we propose a new framework comprised of two modules, i.e., the RO decomposition and RO reconstruction. The RO decomposition is developed to decompose a corrupted image into the RO components and residual. This is achieved by successively applying RO projections to the image or its residuals to extract the RO components. The RO projections, based on neural networks, extract the closest RO component of an image. The RO reconstruction is aimed to reconstruct the important information, respectively from the RO components and residual, as well as to restore the image from this reconstructed information. Experimental results on four tasks, i.e., noise-free image super-resolution (SR), realistic image SR, gray-scale image denoising, and color image denoising, show that the method is effective and efficient for image restoration, and it delivers superior performance for realistic image SR and color image denoising.