Abstract:Convolutional neural networks (CNNs) are essential tools for computer vision tasks, but they lack traditionally desired properties of extracted features that could further improve model performance, e.g., rotational equivariance. Such properties are ubiquitous in biomedical images, which often lack explicit orientation. While current work largely relies on data augmentation or explicit modules to capture orientation information, this comes at the expense of increased training costs or ineffective approximations of the desired equivariance. To overcome these challenges, we propose a novel and efficient implementation of the Symmetric Rotation-Equivariant (SRE) Convolution (SRE-Conv) kernel, designed to learn rotation-invariant features while simultaneously compressing the model size. The SRE-Conv kernel can easily be incorporated into any CNN backbone. We validate the ability of a deep SRE-CNN to capture equivariance to rotation using the public MedMNISTv2 dataset (16 total tasks). SRE-Conv-CNN demonstrated improved rotated image classification performance accuracy on all 16 test datasets in both 2D and 3D images, all while increasing efficiency with fewer parameters and reduced memory footprint. The code is available at https://github.com/XYPB/SRE-Conv.
Abstract:Contrastive Language-Image Pre-training (CLIP) shows promise in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose the first adaptation of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and data imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline.
Abstract:Recent advancements in Contrastive Language-Image Pre-training (CLIP) have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets are not always common. Meanwhile, the language model prompts are mainly manually derived from labels tied to images, potentially overlooking the richness of information within training samples. We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) that harnesses the strengths of the extensive pre-trained language and visual models. Furthermore, we present an efficient strategy for learning context-based prompts that mitigates the gap between informative clinical diagnostic data and simple class labels. Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets compared with various baselines. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
Abstract:Background: We propose a novel method to identify who may likely have clonal hematopoiesis of indeterminate potential (CHIP), a condition characterized by the presence of somatic mutations in hematopoietic stem cells without detectable hematologic malignancy, using deep learning techniques. Methods: We developed a convolutional neural network (CNN) to predict CHIP status using 4 different views from standard delayed gadolinium-enhanced cardiac magnetic resonance imaging (CMR). We used 5-fold cross validation on 82 cardio-oncology patients to assess the performance of our model. Different algorithms were compared to find the optimal patient-level prediction method using the image-level CNN predictions. Results: We found that the best model had an area under the receiver operating characteristic curve of 0.85 and an accuracy of 82%. Conclusions: We conclude that a deep learning-based diagnostic approach for CHIP using CMR is promising.
Abstract:Task-based fMRI uses actions or stimuli to trigger task-specific brain responses and measures them using BOLD contrast. Despite the significant task-induced spatiotemporal brain activation fluctuations, most studies on task-based fMRI ignore the task context information aligned with fMRI and consider task-based fMRI a coherent sequence. In this paper, we show that using the task structures as data-driven guidance is effective for spatiotemporal analysis. We propose STNAGNN, a GNN-based spatiotemporal architecture, and validate its performance in an autism classification task. The trained model is also interpreted for identifying autism-related spatiotemporal brain biomarkers.
Abstract:Digital Breast Tomosynthesis (DBT) is a widely used medical imaging modality for breast cancer screening and diagnosis, offering higher spatial resolution and greater detail through its 3D-like breast volume imaging capability. However, the increased data volume also introduces pronounced data imbalance challenges, where only a small fraction of the volume contains suspicious tissue. This further exacerbates the data imbalance due to the case-level distribution in real-world data and leads to learning a trivial classification model that only predicts the majority class. To address this, we propose a novel method using view-level contrastive Self-supervised Initialization and Fine-Tuning for identifying abnormal DBT images, namely SIFT-DBT. We further introduce a patch-level multi-instance learning method to preserve spatial resolution. The proposed method achieves 92.69% volume-wise AUC on an evaluation of 970 unique studies.
Abstract:Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for early frames where intensity-based image registration techniques often fail. To address this issue, we propose a novel method called Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) that utilizes an all-to-one mapping to convert early frames into those with tracer distribution similar to the last reference frame. The TAI-GAN consists of a feature-wise linear modulation layer that encodes channel-wise parameters generated from temporal information and rough cardiac segmentation masks with local shifts that serve as anatomical information. Our proposed method was evaluated on a clinical 82-Rb PET dataset, and the results show that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, the motion estimation accuracy and subsequent myocardial blood flow (MBF) quantification with both conventional and deep learning-based motion correction methods were improved compared to using the original frames.
Abstract:Insufficiency of training data is a persistent issue in medical image analysis, especially for task-based functional magnetic resonance images (fMRI) with spatio-temporal imaging data acquired using specific cognitive tasks. In this paper, we propose an approach for generating synthetic fMRI sequences that can then be used to create augmented training datasets in downstream learning tasks. To synthesize high-resolution task-specific fMRI, we adapt the $\alpha$-GAN structure, leveraging advantages of both GAN and variational autoencoder models, and propose different alternatives in aggregating temporal information. The synthetic images are evaluated from multiple perspectives including visualizations and an autism spectrum disorder (ASD) classification task. The results show that the synthetic task-based fMRI can provide effective data augmentation in learning the ASD classification task.
Abstract:The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
Abstract:The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.