Abstract:Distribution shifts widely exist in medical images acquired from different medical centers, hindering the deployment of semantic segmentation models trained on data from one center (source domain) to another (target domain). While unsupervised domain adaptation (UDA) has shown significant promise in mitigating these shifts, it poses privacy risks due to sharing data between centers. To facilitate adaptation while preserving data privacy, source-free domain adaptation (SFDA) and test-time adaptation (TTA) have emerged as effective paradigms, relying solely on target domain data. However, the scenarios currently addressed by SFDA and TTA are limited, making them less suitable for clinical applications. In a more realistic clinical scenario, the pre-trained model is deployed in a medical centre to assist with clinical tasks during the day and rest at night. During the daytime process, TTA can be employed to enhance inference performance. During the nighttime process, after collecting the test data from the day, the model can be fine-tuned utilizing SFDA to further adapt to the target domain. With above insights, we propose a novel adaptation framework called Day-Night Adaptation (DyNA). This framework adapts the model to the target domain through day-night loops without requiring access to source data. Specifically, we implement distinct adaptation strategies for daytime and nighttime to better meet the demands of clinical settings. During the daytime, model parameters are frozen, and a specific low-frequency prompt is trained for each test sample. Additionally, we construct a memory bank for prompt initialization and develop a warm-up mechanism to enhance prompt training. During nighttime, we integrate a global student model into the traditional teacher-student self-training paradigm to fine-tune the model while ensuring training stability...
Abstract:Although recent years have witnessed significant advancements in medical image segmentation, the pervasive issue of domain shift among medical images from diverse centres hinders the effective deployment of pre-trained models. Many Test-time Adaptation (TTA) methods have been proposed to address this issue by fine-tuning pre-trained models with test data during inference. These methods, however, often suffer from less-satisfactory optimization due to suboptimal optimization direction (dictated by the gradient) and fixed step-size (predicated on the learning rate). In this paper, we propose the Gradient alignment-based Test-time adaptation (GraTa) method to improve both the gradient direction and learning rate in the optimization procedure. Unlike conventional TTA methods, which primarily optimize the pseudo gradient derived from a self-supervised objective, our method incorporates an auxiliary gradient with the pseudo one to facilitate gradient alignment. Such gradient alignment enables the model to excavate the similarities between different gradients and correct the gradient direction to approximate the empirical gradient related to the current segmentation task. Additionally, we design a dynamic learning rate based on the cosine similarity between the pseudo and auxiliary gradients, thereby empowering the adaptive fine-tuning of pre-trained models on diverse test data. Extensive experiments establish the effectiveness of the proposed gradient alignment and dynamic learning rate and substantiate the superiority of our GraTa method over other state-of-the-art TTA methods on a benchmark medical image segmentation task. The code and weights of pre-trained source models will be available.
Abstract:Accurate segmentation of polyps from colonoscopy videos is of great significance to polyp treatment and early prevention of colorectal cancer. However, it is challenging due to the difficulties associated with modelling long-range spatio-temporal relationships within a colonoscopy video. In this paper, we address this challenging task with a novel Mixture-Attention Siamese Transformer (MAST), which explicitly models the long-range spatio-temporal relationships with a mixture-attention mechanism for accurate polyp segmentation. Specifically, we first construct a Siamese transformer architecture to jointly encode paired video frames for their feature representations. We then design a mixture-attention module to exploit the intra-frame and inter-frame correlations, enhancing the features with rich spatio-temporal relationships. Finally, the enhanced features are fed to two parallel decoders for predicting the segmentation maps. To the best of our knowledge, our MAST is the first transformer model dedicated to video polyp segmentation. Extensive experiments on the large-scale SUN-SEG benchmark demonstrate the superior performance of MAST in comparison with the cutting-edge competitors. Our code is publicly available at https://github.com/Junqing-Yang/MAST.
Abstract:Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications. Test-time adaptation has proven its effectiveness in tackling the cross-domain distribution shift during inference. However, most existing methods achieve adaptation by updating the pre-trained models, rendering them susceptible to error accumulation and catastrophic forgetting when encountering a series of distribution shifts (i.e., under the continual test-time adaptation setup). To overcome these challenges caused by updating the models, in this paper, we freeze the pre-trained model and propose the Visual Prompt-based Test-Time Adaptation (VPTTA) method to train a specific prompt for each test image to align the statistics in the batch normalization layers. Specifically, we present the low-frequency prompt, which is lightweight with only a few parameters and can be effectively trained in a single iteration. To enhance prompt initialization, we equip VPTTA with a memory bank to benefit the current prompt from previous ones. Additionally, we design a warm-up mechanism, which mixes source and target statistics to construct warm-up statistics, thereby facilitating the training process. Extensive experiments demonstrate the superiority of our VPTTA over other state-of-the-art methods on two medical image segmentation benchmark tasks. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/VPTTA.
Abstract:Although recent years have witnessed the great success of convolutional neural networks (CNNs) in medical image segmentation, the domain shift issue caused by the highly variable image quality of medical images hinders the deployment of CNNs in real-world clinical applications. Domain generalization (DG) methods aim to address this issue by training a robust model on the source domain, which has a strong generalization ability. Previously, many DG methods based on feature-space domain randomization have been proposed, which, however, suffer from the limited and unordered search space of feature styles. In this paper, we propose a multi-source DG method called Treasure in Distribution (TriD), which constructs an unprecedented search space to obtain the model with strong robustness by randomly sampling from a uniform distribution. To learn the domain-invariant representations explicitly, we further devise a style-mixing strategy in our TriD, which mixes the feature styles by randomly mixing the augmented and original statistics along the channel wise and can be extended to other DG methods. Extensive experiments on two medical segmentation tasks with different modalities demonstrate that our TriD achieves superior generalization performance on unseen target-domain data. Code is available at https://github.com/Chen-Ziyang/TriD.
Abstract:Glaucoma is one of the leading causes of irreversible blindness. Segmentation of optic disc (OD) and optic cup (OC) on fundus images is a crucial step in glaucoma screening. Although many deep learning models have been constructed for this task, it remains challenging to train an OD/OC segmentation model that could be deployed successfully to different healthcare centers. The difficulties mainly comes from the domain shift issue, i.e., the fundus images collected at these centers usually vary greatly in the tone, contrast, and brightness. To address this issue, in this paper, we propose a novel unsupervised domain adaptation (UDA) method called Reconstruction-driven Dynamic Refinement Network (RDR-Net), where we employ a due-path segmentation backbone for simultaneous edge detection and region prediction and design three modules to alleviate the domain gap. The reconstruction alignment (RA) module uses a variational auto-encoder (VAE) to reconstruct the input image and thus boosts the image representation ability of the network in a self-supervised way. It also uses a style-consistency constraint to force the network to retain more domain-invariant information. The low-level feature refinement (LFR) module employs input-specific dynamic convolutions to suppress the domain-variant information in the obtained low-level features. The prediction-map alignment (PMA) module elaborates the entropy-driven adversarial learning to encourage the network to generate source-like boundaries and regions. We evaluated our RDR-Net against state-of-the-art solutions on four public fundus image datasets. Our results indicate that RDR-Net is superior to competing models in both segmentation performance and generalization ability
Abstract:Image registration of liver dynamic contrast-enhanced computed tomography (DCE-CT) is crucial for diagnosis and image-guided surgical planning of liver cancer. However, intensity variations due to the flow of contrast agents combined with complex spatial motion induced by respiration brings great challenge to existing intensity-based registration methods. To address these problems, we propose a novel structure-aware registration method by incorporating structural information of related organs with segmentation-guided deep registration network. Existing segmentation-guided registration methods only focus on volumetric registration inside the paired organ segmentations, ignoring the inherent attributes of their anatomical structures. In addition, such paired organ segmentations are not always available in DCE-CT images due to the flow of contrast agents. Different from existing segmentation-guided registration methods, our proposed method extracts structural information in hierarchical geometric perspectives of line and surface. Then, according to the extracted structural information, structure-aware constraints are constructed and imposed on the forward and backward deformation field simultaneously. In this way, all available organ segmentations, including unpaired ones, can be fully utilized to avoid the side effect of contrast agent and preserve the topology of organs during registration. Extensive experiments on an in-house liver DCE-CT dataset and a public LiTS dataset show that our proposed method can achieve higher registration accuracy and preserve anatomical structure more effectively than state-of-the-art methods.
Abstract:Multi-modal medical image completion has been extensively applied to alleviate the missing modality issue in a wealth of multi-modal diagnostic tasks. However, for most existing synthesis methods, their inferences of missing modalities can collapse into a deterministic mapping from the available ones, ignoring the uncertainties inherent in the cross-modal relationships. Here, we propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM) in modeling and stochastically sampling a target probability distribution, and further extend SGM to cross-modal conditional synthesis for various missing-modality configurations in a unified framework. Specifically, UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions via conditional diffusion and reverse generation in the complete modality space. In this way, the generation process can be accurately conditioned by all available information, and can fit all possible configurations of missing modalities in a single network. Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions for any missing modalities.
Abstract:In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites rather than a consolidated set, due to the storage cost and privacy restriction. However, during the continual learning process, existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites. This paper aims to tackle the challenging problem of Synchronous Memorizability and Generalizability (SMG) and to simultaneously improve performance on both previous and unseen sites, with a novel proposed SMG-learning framework. First, we propose a Synchronous Gradient Alignment (SGA) objective, which \emph{not only} promotes the network memorizability by enforcing coordinated optimization for a small exemplar set from previous sites (called replay buffer), \emph{but also} enhances the generalizability by facilitating site-invariance under simulated domain shift. Second, to simplify the optimization of SGA objective, we design a Dual-Meta algorithm that approximates the SGA objective as dual meta-objectives for optimization without expensive computation overhead. Third, for efficient rehearsal, we configure the replay buffer comprehensively considering additional inter-site diversity to reduce redundancy. Experiments on prostate MRI data sequentially acquired from six institutes demonstrate that our method can simultaneously achieve higher memorizability and generalizability over state-of-the-art methods. Code is available at https://github.com/jingyzhang/SMG-Learning.
Abstract:We recognize that the skin lesion diagnosis is an essential and challenging sub-task in Image classification, in which the Fisher vector (FV) encoding algorithm and deep convolutional neural network (DCNN) are two of the most successful techniques. Since the joint use of FV and DCNN has demonstrated proven success, the joint techniques could have discriminatory power on skin lesion diagnosis as well. To this hypothesis, we propose the aggregation algorithm for skin lesion diagnosis that utilize the residual network to extract the local features and the Fisher vector method to aggregate the local features to image-level representation. We applied our algorithm on the International Skin Imaging Collaboration 2018 (ISIC2018) challenge and only focus on the third task, i.e., the disease classification.