Abstract:Parkinson's disease (PD), a degenerative disorder of the central nervous system, is commonly diagnosed using functional medical imaging techniques such as single-photon emission computed tomography (SPECT). In this study, we utilized two SPECT data sets (n = 634 and n = 202) from different hospitals to develop a model capable of accurately predicting PD stages, a multiclass classification task. We used the entire three-dimensional (3D) brain images as input and experimented with various model architectures. Initially, we treated the 3D images as sequences of two-dimensional (2D) slices and fed them sequentially into 2D convolutional neural network (CNN) models pretrained on ImageNet, averaging the outputs to obtain the final predicted stage. We also applied 3D CNN models pretrained on Kinetics-400. Additionally, we incorporated an attention mechanism to account for the varying importance of different slices in the prediction process. To further enhance model efficacy and robustness, we simultaneously trained the two data sets using weight sharing, a technique known as cotraining. Our results demonstrated that 2D models pretrained on ImageNet outperformed 3D models pretrained on Kinetics-400, and models utilizing the attention mechanism outperformed both 2D and 3D models. The cotraining technique proved effective in improving model performance when the cotraining data sets were sufficiently large.
Abstract:The classification of medical images is a pivotal aspect of disease diagnosis, often enhanced by deep learning techniques. However, traditional approaches typically focus on unimodal medical image data, neglecting the integration of diverse non-image patient data. This paper proposes a novel Cross-Graph Modal Contrastive Learning (CGMCL) framework for multimodal medical image classification. The model effectively integrates both image and non-image data by constructing cross-modality graphs and leveraging contrastive learning to align multimodal features in a shared latent space. An inter-modality feature scaling module further optimizes the representation learning process by reducing the gap between heterogeneous modalities. The proposed approach is evaluated on two datasets: a Parkinson's disease (PD) dataset and a public melanoma dataset. Results demonstrate that CGMCL outperforms conventional unimodal methods in accuracy, interpretability, and early disease prediction. Additionally, the method shows superior performance in multi-class melanoma classification. The CGMCL framework provides valuable insights into medical image classification while offering improved disease interpretability and predictive capabilities.
Abstract:Parkinson's Disease (PD) is a neurodegenerative neurological disorder that impacts movement and afflicts over 10 million people worldwide. Previous researches have come up with deep learning models for predicting Parkinson's disease primarily using medical images and didn't leverage the manifold structure in the dataset. Our study introduces a multimodal approach with both image and non-image features with a contrastive cross-view graph fusion for Parkinson's disease classification. Specifically, we designed a multimodal co-attention module to integrate embeddings from two distinct graph views derived from low dimensional representation of images and clinical features, enabling the extraction of more stable and structured features from the multiview data. Additionally, we have devised a simplified fusion method utilizing a contrastive loss for positive and negative pairs, to enhance the model's overall cross-view fusion learning capabilities. In our experiments, the graph-view multimodal approach can achieve an accuracy rate of 91% and an AUC of 92.8% in five-fold cross-validation, and it also demonstrates superior predictive capabilities on non-image data as compared to methods that rely solely on machine learning methods.