Abstract:Multimodal image-tabular learning is gaining attention, yet it faces challenges due to limited labeled data. While earlier work has applied self-supervised learning (SSL) to unlabeled data, its task-agnostic nature often results in learning suboptimal features for downstream tasks. Semi-supervised learning (SemiSL), which combines labeled and unlabeled data, offers a promising solution. However, existing multimodal SemiSL methods typically focus on unimodal or modality-shared features, ignoring valuable task-relevant modality-specific information, leading to a Modality Information Gap. In this paper, we propose STiL, a novel SemiSL tabular-image framework that addresses this gap by comprehensively exploring task-relevant information. STiL features a new disentangled contrastive consistency module to learn cross-modal invariant representations of shared information while retaining modality-specific information via disentanglement. We also propose a novel consensus-guided pseudo-labeling strategy to generate reliable pseudo-labels based on classifier consensus, along with a new prototype-guided label smoothing technique to refine pseudo-label quality with prototype embeddings, thereby enhancing task-relevant information learning in unlabeled data. Experiments on natural and medical image datasets show that STiL outperforms the state-of-the-art supervised/SSL/SemiSL image/multimodal approaches. Our code is publicly available.
Abstract:Accelerated magnetic resonance imaging involves reconstructing fully sampled images from undersampled k-space measurements. Current state-of-the-art approaches have mainly focused on either end-to-end supervised training inspired by compressed sensing formulations, or posterior sampling methods built on modern generative models. However, their efficacy heavily relies on large datasets of fully sampled images, which may not always be available in practice. To address this issue, we propose an unsupervised MRI reconstruction method based on ground-truth-free flow matching (GTF$^2$M). Particularly, the GTF$^2$M learns a prior denoising process of fully sampled ground-truth images using only undersampled data. Based on that, an efficient cyclic reconstruction algorithm is further proposed to perform forward and backward integration in the dual space of image-space signal and k-space measurement. We compared our method with state-of-the-art learning-based baselines on the fastMRI database of both single-coil knee and multi-coil brain MRIs. The results show that our proposed unsupervised method can significantly outperform existing unsupervised approaches, and achieve performance comparable to most supervised end-to-end and prior learning baselines trained on fully sampled MRI, while offering greater efficiency than the compared generative model-based approaches.
Abstract:Multi-contrast image registration is a challenging task due to the complex intensity relationships between different imaging contrasts. Conventional image registration methods are typically based on iterative optimizations for each input image pair, which is time-consuming and sensitive to contrast variations. While learning-based approaches are much faster during the inference stage, due to generalizability issues, they typically can only be applied to the fixed contrasts observed during the training stage. In this work, we propose a novel contrast-agnostic deformable image registration framework that can be generalized to arbitrary contrast images, without observing them during training. Particularly, we propose a random convolution-based contrast augmentation scheme, which simulates arbitrary contrasts of images over a single image contrast while preserving their inherent structural information. To ensure that the network can learn contrast-invariant representations for facilitating contrast-agnostic registration, we further introduce contrast-invariant latent regularization (CLR) that regularizes representation in latent space through a contrast invariance loss. Experiments show that CAR outperforms the baseline approaches regarding registration accuracy and also possesses better generalization ability to unseen imaging contrasts. Code is available at \url{https://github.com/Yinsong0510/CAR}.
Abstract:Current deep learning approaches in medical image registration usually face the challenges of distribution shift and data collection, hindering real-world deployment. In contrast, universal medical image registration aims to perform registration on a wide range of clinically relevant tasks simultaneously, thus having tremendous potential for clinical applications. In this paper, we present the first attempt to achieve the goal of universal 3D medical image registration in sequential learning scenarios by proposing a continual learning method. Specifically, we utilize meta-learning with experience replay to mitigating the problem of catastrophic forgetting. To promote the generalizability of meta-continual learning, we further propose sharpness-aware meta-continual learning (SAMCL). We validate the effectiveness of our method on four datasets in a continual learning setup, including brain MR, abdomen CT, lung CT, and abdomen MR-CT image pairs. Results have shown the potential of SAMCL in realizing universal image registration, which performs better than or on par with vanilla sequential or centralized multi-task training strategies.The source code will be available from https://github.com/xzluo97/Continual-Reg.
Abstract:This article presents a general Bayesian learning framework for multi-modal groupwise registration on medical images. The method builds on probabilistic modelling of the image generative process, where the underlying common anatomy and geometric variations of the observed images are explicitly disentangled as latent variables. Thus, groupwise registration is achieved through the solution to Bayesian inference. We propose a novel hierarchical variational auto-encoding architecture to realize the inference procedure of the latent variables, where the registration parameters can be calculated in a mathematically interpretable fashion. Remarkably, this new paradigm can learn groupwise registration in an unsupervised closed-loop self-reconstruction process, sparing the burden of designing complex intensity-based similarity measures. The computationally efficient disentangled architecture is also inherently scalable and flexible, allowing for groupwise registration on large-scale image groups with variable sizes. Furthermore, the inferred structural representations from disentanglement learning are capable of capturing the latent anatomy of the observations with visual semantics. Extensive experiments were conducted to validate the proposed framework, including four datasets from cardiac, brain and abdominal medical images. The results have demonstrated the superiority of our method over conventional similarity-based approaches in terms of accuracy, efficiency, scalability and interpretability.
Abstract:This paper presents a generic probabilistic framework for estimating the statistical dependency and finding the anatomical correspondences among an arbitrary number of medical images. The method builds on a novel formulation of the $N$-dimensional joint intensity distribution by representing the common anatomy as latent variables and estimating the appearance model with nonparametric estimators. Through connection to maximum likelihood and the expectation-maximization algorithm, an information\hyp{}theoretic metric called $\mathcal{X}$-metric and a co-registration algorithm named $\mathcal{X}$-CoReg are induced, allowing groupwise registration of the $N$ observed images with computational complexity of $\mathcal{O}(N)$. Moreover, the method naturally extends for a weakly-supervised scenario where anatomical labels of certain images are provided. This leads to a combined\hyp{}computing framework implemented with deep learning, which performs registration and segmentation simultaneously and collaboratively in an end-to-end fashion. Extensive experiments were conducted to demonstrate the versatility and applicability of our model, including multimodal groupwise registration, motion correction for dynamic contrast enhanced magnetic resonance images, and deep combined computing for multimodal medical images. Results show the superiority of our method in various applications in terms of both accuracy and efficiency, highlighting the advantage of the proposed representation of the imaging process.
Abstract:Previous methods on multimodal groupwise registration typically require certain highly specialized similarity metrics with restrained applicability. In this work, we instead propose a general framework which formulates groupwise registration as a procedure of hierarchical Bayesian inference. Here, the imaging process of multimodal medical images, including shape transition and appearance variation, is characterized by a disentangled variational auto-encoder. To this end, we propose a novel variational posterior and network architecture that facilitate joint learning of the common structural representation and the desired spatial correspondences. The performance of the proposed model was validated on two publicly available multimodal datasets, i.e., BrainWeb and MS-CMR of the heart. Results have demonstrated the efficacy of our framework in realizing multimodal groupwise registration in an end-to-end fashion.
Abstract:Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore potential of solutions, as well as to provide a benchmark for future research. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. Note that MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).
Abstract:Registration networks have shown great application potentials in medical image analysis. However, supervised training methods have a great demand for large and high-quality labeled datasets, which is time-consuming and sometimes impractical due to data sharing issues. Unsupervised image registration algorithms commonly employ intensity-based similarity measures as loss functions without any manual annotations. These methods estimate the parameterized transformations between pairs of moving and fixed images through the optimization of the network parameters during training. However, these methods become less effective when the image quality varies, e.g., some images are corrupted by substantial noise or artifacts. In this work, we propose a novel approach based on a low-rank representation, i.e., Regnet-LRR, to tackle the problem. We project noisy images into a noise-free low-rank space, and then compute the similarity between the images. Based on the low-rank similarity measure, we train the registration network to predict the dense deformation fields of noisy image pairs. We highlight that the low-rank projection is reformulated in a way that the registration network can successfully update gradients. With two tasks, i.e., cardiac and abdominal intra-modality registration, we demonstrate that the low-rank representation can boost the generalization ability and robustness of models as well as bring significant improvements in noisy data registration scenarios.
Abstract:Pathological area segmentation in cardiac magnetic resonance (MR) images plays a vital role in the clinical diagnosis of cardiovascular diseases. Because of the irregular shape and small area, pathological segmentation has always been a challenging task. We propose an anatomy prior based framework, which combines the U-net segmentation network with the attention technique. Leveraging the fact that the pathology is inclusive, we propose a neighborhood penalty strategy to gauge the inclusion relationship between the myocardium and the myocardial infarction and no-reflow areas. This neighborhood penalty strategy can be applied to any two labels with inclusive relationships (such as the whole infarction and myocardium, etc.) to form a neighboring loss. The proposed framework is evaluated on the EMIDEC dataset. Results show that our framework is effective in pathological area segmentation.