from the iSTAGING consortium, for the ADNI
Abstract:Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.
Abstract:Convolutional neural networks (CNNs) have shown great effectiveness in medical image segmentation. However, they may be limited in modeling large inter-subject variations in organ shapes and sizes and exploiting global long-range contextual information. This is because CNNs typically employ convolutions with fixed-sized local receptive fields and lack the mechanisms to utilize global information. To address these limitations, we developed Dynamic Multi-Resolution Convolution (DMRC) and Dynamic Multi-Scale Convolution (DMSC) modules. Both modules enhance the representation capabilities of single convolutions to capture varying scaled features and global contextual information. This is achieved in the DMRC module by employing a convolutional filter on images with different resolutions and subsequently utilizing dynamic mechanisms to model global inter-dependencies between features. In contrast, the DMSC module extracts features at different scales by employing convolutions with different kernel sizes and utilizing dynamic mechanisms to extract global contextual information. The utilization of convolutions with different kernel sizes in the DMSC module may increase computational complexity. To lessen this burden, we propose to use a lightweight design for convolution layers with a large kernel size. Thus, DMSC and DMRC modules are designed as lightweight drop-in replacements for single convolutions, and they can be easily integrated into general CNN architectures for end-to-end training. The segmentation network was proposed by incorporating our DMSC and DMRC modules into a standard U-Net architecture, termed Dynamic Multi-scale and Multi-resolution Convolution network (DMC-Net). The results demonstrate that our proposed DMSC and DMRC can enhance the representation capabilities of single convolutions and improve segmentation accuracy.
Abstract:Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity modeling, which empirically show inferior performance but with a high computational cost. To bridge this gap, we propose a novel MIL aggregation method based on diverse global representation (DGR-MIL), by modeling diversity among instances through a set of global vectors that serve as a summary of all instances. First, we turn the instance correlation into the similarity between instance embeddings and the predefined global vectors through a cross-attention mechanism. This stems from the fact that similar instance embeddings typically would result in a higher correlation with a certain global vector. Second, we propose two mechanisms to enforce the diversity among the global vectors to be more descriptive of the entire bag: (i) positive instance alignment and (ii) a novel, efficient, and theoretically guaranteed diversification learning paradigm. Specifically, the positive instance alignment module encourages the global vectors to align with the center of positive instances (e.g., instances containing tumors in WSI). To further diversify the global representations, we propose a novel diversification learning paradigm leveraging the determinantal point process. The proposed model outperforms the state-of-the-art MIL aggregation models by a substantial margin on the CAMELYON-16 and the TCGA-lung cancer datasets. The code is available at \url{https://github.com/ChongQingNoSubway/DGR-MIL}.
Abstract:Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet}
Abstract:Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC.
Abstract:Early identification of Mild Cognitive Impairment (MCI) subjects who will eventually progress to Alzheimer Disease (AD) is challenging. Existing deep learning models are mostly single-modality single-task models predicting risk of disease progression at a fixed timepoint. We proposed a multimodal hierarchical multi-task learning approach which can monitor the risk of disease progression at each timepoint of the visit trajectory. Longitudinal visit data from multiple modalities (MRI, cognition, and clinical data) were collected from MCI individuals of the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset. Our hierarchical model predicted at every timepoint a set of neuropsychological composite cognitive function scores as auxiliary tasks and used the forecasted scores at every timepoint to predict the future risk of disease. Relevance weights for each composite function provided explanations about potential factors for disease progression. Our proposed model performed better than state-of-the-art baselines in predicting AD progression risk and the composite scores. Ablation study on the number of modalities demonstrated that imaging and cognition data contributed most towards the outcome. Model explanations at each timepoint can inform clinicians 6 months in advance the potential cognitive function decline that can lead to progression to AD in future. Our model monitored their risk of AD progression every 6 months throughout the visit trajectory of individuals. The hierarchical learning of auxiliary tasks allowed better optimization and allowed longitudinal explanations for the outcome. Our framework is flexible with the number of input modalities and the selection of auxiliary tasks and hence can be generalized to other clinical problems too.
Abstract:Alzheimer Disease (AD) is a multi-faceted disorder, with each modality providing unique and complementary info about AD. In this study, we used a deep-learning based multimodal normative model to assess the heterogeneity in regional brain patterns for ATN (amyloid-tau-neurodegeneration) biomarkers. We selected discovery (n = 665) and replication (n = 430) cohorts with simultaneous availability of ATN biomarkers: Florbetapir amyloid, Flortaucipir tau and T1-weighted MRI (magnetic resonance imaging) imaging. A multimodal variational autoencoder (conditioned on age and sex) was used as a normative model to learn the multimodal regional brain patterns of a cognitively unimpaired (CU) control group. The trained model was applied on individuals on the ADS (AD Spectrum) to estimate their deviations (Z-scores) from the normative distribution, resulting in a Z-score regional deviation map per ADS individual per modality. ADS individuals with moderate or severe dementia showed higher proportion of regional outliers for each modality as well as more dissimilarity in modality-specific regional outlier patterns compared to ADS individuals with early or mild dementia. DSI was associated with the progressive stages of dementia, (ii) showed significant associations with neuropsychological composite scores and (iii) related to the longitudinal risk of CDR progression. Findings were reproducible in both discovery and replication cohorts. Our is the first study to examine the heterogeneity in AD through the lens of multiple neuroimaging modalities (ATN), based on distinct or overlapping patterns of regional outlier deviations. Regional MRI and tau outliers were more heterogenous than regional amyloid outliers. DSI has the potential to be an individual patient metric of neurodegeneration that can help in clinical decision making and monitoring patient response for anti-amyloid treatments.
Abstract:In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalability. However, we argue that the current design of the vision transformer-based UNet (ViT-UNet) segmentation models may not effectively handle the heterogeneous appearance (e.g., varying shapes and sizes) of objects of interest in medical image segmentation tasks. To tackle this challenge, we present a structured approach to introduce spatially dynamic components to the ViT-UNet. This adaptation enables the model to effectively capture features of target objects with diverse appearances. This is achieved by three main components: \textbf{(i)} deformable patch embedding; \textbf{(ii)} spatially dynamic multi-head attention; \textbf{(iii)} deformable positional encoding. These components were integrated into a novel architecture, termed AgileFormer. AgileFormer is a spatially agile ViT-UNet designed for medical image segmentation. Experiments in three segmentation tasks using publicly available datasets demonstrated the effectiveness of the proposed method. The code is available at \href{https://github.com/sotiraslab/AgileFormer}{https://github.com/sotiraslab/AgileFormer}.
Abstract:Hierarchical transformers have achieved significant success in medical image segmentation due to their large receptive field and capabilities of effectively leveraging global long-range contextual information. Convolutional neural networks (CNNs) can also deliver a large receptive field by using large kernels, enabling them to achieve competitive performance with fewer model parameters. However, CNNs incorporated with large convolutional kernels remain constrained in adaptively capturing multi-scale features from organs with large variations in shape and size due to the employment of fixed-sized kernels. Additionally, they are unable to utilize global contextual information efficiently. To address these limitations, we propose Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. The DLK module employs multiple large kernels with varying kernel sizes and dilation rates to capture multi-scale features. Subsequently, a dynamic selection mechanism is utilized to adaptively highlight the most important spatial features based on global information. Additionally, the DFF module is proposed to adaptively fuse multi-scale local feature maps based on their global information. We integrate DLK and DFF in a hierarchical transformer architecture to develop a novel architecture, termed D-Net. D-Net is able to effectively utilize a multi-scale large receptive field and adaptively harness global contextual information. Extensive experimental results demonstrate that D-Net outperforms other state-of-the-art models in the two volumetric segmentation tasks, including abdominal multi-organ segmentation and multi-modality brain tumor segmentation. Our code is available at https://github.com/sotiraslab/DLK.
Abstract:U-Net has been widely used for segmenting abdominal organs, achieving promising performance. However, when it is used for multi-organ segmentation, first, it may be limited in exploiting global long-range contextual information due to the implementation of standard convolutions. Second, the use of spatial-wise downsampling (e.g., max pooling or strided convolutions) in the encoding path may lead to the loss of deformable or discriminative details. Third, features upsampled from the higher level are concatenated with those that persevered via skip connections. However, repeated downsampling and upsampling operations lead to misalignments between them and their concatenation degrades segmentation performance. To address these limitations, we propose Dynamically Calibrated Convolution (DCC), Dynamically Calibrated Downsampling (DCD), and Dynamically Calibrated Upsampling (DCU) modules, respectively. The DCC module can utilize global inter-dependencies between spatial and channel features to calibrate these features adaptively. The DCD module enables networks to adaptively preserve deformable or discriminative features during downsampling. The DCU module can dynamically align and calibrate upsampled features to eliminate misalignments before concatenations. We integrated the proposed modules into a standard U-Net, resulting in a new architecture, termed Dynamic U-Net. This architectural design enables U-Net to dynamically adjust features for different organs. We evaluated Dynamic U-Net in two abdominal multi-organ segmentation benchmarks. Dynamic U-Net achieved statistically improved segmentation accuracy compared with standard U-Net. Our code is available at https://github.com/sotiraslab/DynamicUNet.