Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
Abstract:This study proposes an attention-based statistical distance-guided unsupervised domain adaptation model for multi-class cardiovascular magnetic resonance (CMR) image quality assessment. The proposed model consists of a feature extractor, a label predictor and a statistical distance estimator. An annotated dataset as the source set and an unlabeled dataset as the target set with different statistical distributions are considered inputs. The statistical distance estimator approximates the Wasserstein distance between the extracted feature vectors from the source and target data in a mini-batch. The label predictor predicts data labels of source data and uses a combinational loss function for training, which includes cross entropy and centre loss functions plus the estimated value of the distance estimator. Four datasets, including imaging and k-space data, were used to evaluate the proposed model in identifying four common CMR imaging artefacts: respiratory and cardiac motions, Gibbs ringing and Aliasing. The results of the extensive experiments showed that the proposed model, both in image and k-space analysis, has an acceptable performance in covering the domain shift between the source and target sets. The model explainability evaluations and the ablation studies confirmed the proper functioning and effectiveness of all the model's modules. The proposed model outperformed the previous studies regarding performance and the number of examined artefacts. The proposed model can be used for CMR post-imaging quality control or in large-scale cohort studies for image and k-space quality assessment due to the appropriate performance in domain shift coverage without a tedious data-labelling process.
Abstract:Background: Image classification can be considered one of the key pillars of medical image analysis. Deep learning (DL) faces challenges that prevent its practical applications despite the remarkable improvement in medical image classification. The data distribution differences can lead to a drop in the efficiency of DL, known as the domain shift problem. Besides, requiring bulk annotated data for model training, the large size of models, and the privacy-preserving of patients are other challenges of using DL in medical image classification. This study presents a strategy that can address the mentioned issues simultaneously. Method: The proposed domain adaptive model based on knowledge distillation can classify images by receiving limited annotated data of different distributions. The designed multiple teachers-meticulous student model trains a student network that tries to solve the challenges by receiving the parameters of several teacher networks. The proposed model was evaluated using six available datasets of different distributions by defining the respiratory motion artefact detection task. Results: The results of extensive experiments using several datasets show the superiority of the proposed model in addressing the domain shift problem and lack of access to bulk annotated data. Besides, the privacy preservation of patients by receiving only the teacher network parameters instead of the original data and consolidating the knowledge of several DL models into a model with almost similar performance are other advantages of the proposed model. Conclusions: The proposed model can pave the way for practical clinical applications of deep classification methods by achieving the mentioned objectives simultaneously.
Abstract:Radiation therapy is the primary method used to treat cancer in the clinic. Its goal is to deliver a precise dose to the planning target volume (PTV) while protecting the surrounding organs at risk (OARs). However, the traditional workflow used by dosimetrists to plan the treatment is time-consuming and subjective, requiring iterative adjustments based on their experience. Deep learning methods can be used to predict dose distribution maps to address these limitations. The study proposes a cascade model for organs at risk segmentation and dose distribution prediction. An encoder-decoder network has been developed for the segmentation task, in which the encoder consists of transformer blocks, and the decoder uses multi-scale convolutional blocks. Another cascade encoder-decoder network has been proposed for dose distribution prediction using a pyramid architecture. The proposed model has been evaluated using an in-house head and neck cancer dataset of 96 patients and OpenKBP, a public head and neck cancer dataset of 340 patients. The segmentation subnet achieved 0.79 and 2.71 for Dice and HD95 scores, respectively. This subnet outperformed the existing baselines. The dose distribution prediction subnet outperformed the winner of the OpenKBP2020 competition with 2.77 and 1.79 for dose and DVH scores, respectively. The predicted dose maps showed good coincidence with ground truth, with a superiority after linking with the auxiliary segmentation task. The proposed model outperformed state-of-the-art methods, especially in regions with low prescribed doses.
Abstract:DeepFake involves the use of deep learning and artificial intelligence techniques to produce or change video and image contents typically generated by GANs. Moreover, it can be misused and leads to fictitious news, ethical and financial crimes, and also affects the performance of facial recognition systems. Thus, detection of real or fake images is significant specially to authenticate originality of people's images or videos. One of the most important challenges in this topic is obstruction that decreases the system precision. In this study, we present a deep learning approach using the entire face and face patches to distinguish real/fake images in the presence of obstruction with a three-path decision: first entire-face reasoning, second a decision based on the concatenation of feature vectors of face patches, and third a majority vote decision based on these features. To test our approach, new datasets including real and fake images are created. For producing fake images, StyleGAN and StyleGAN2 are trained by FFHQ images and also StarGAN and PGGAN are trained by CelebA images. The CelebA and FFHQ datasets are used as real images. The proposed approach reaches higher results in early epochs than other methods and increases the SoTA results by 0.4\%-7.9\% in the different built data-sets. Also, we have shown in experimental results that weighing the patches may improve accuracy.
Abstract:Background and Objectives: Cardiovascular magnetic resonance (CMR) imaging is a powerful modality in functional and anatomical assessment for various cardiovascular diseases. Sufficient image quality is essential to achieve proper diagnosis and treatment. A large number of medical images, the variety of imaging artefacts, and the workload of imaging centres are among the things that reveal the necessity of automatic image quality assessment (IQA). However, automated IQA requires access to bulk annotated datasets for training deep learning (DL) models. Labelling medical images is a tedious, costly and time-consuming process, which creates a fundamental challenge in proposing DL-based methods for medical applications. This study aims to present a new method for CMR IQA when there is limited access to annotated datasets. Methods: The proposed generalised deep meta-learning model can evaluate the quality by learning tasks in the prior stage and then fine-tuning the resulting model on a small labelled dataset of the desired tasks. This model was evaluated on the data of over 6,000 subjects from the UK Biobank for five defined tasks, including detecting respiratory motion, cardiac motion, Aliasing and Gibbs ringing artefacts and images without artefacts. Results: The results of extensive experiments show the superiority of the proposed model. Besides, comparing the model's accuracy with the domain adaptation model indicates a significant difference by using only 64 annotated images related to the desired tasks. Conclusion: The proposed model can identify unknown artefacts in images with acceptable accuracy, which makes it suitable for medical applications and quality assessment of large cohorts.
Abstract:Purpose Predicting the progression of MCI to Alzheimer's disease is an important step in reducing the progression of the disease. Therefore, many methods have been introduced for this task based on deep learning. Among these approaches, the methods based on ROIs are in a good position in terms of accuracy and complexity. In these techniques, some specific parts of the brain are extracted as ROI manually for all of the patients. Extracting ROI manually is time-consuming and its results depend on human expertness and precision. Method To overcome these limitations, we propose a novel smart method for detecting ROIs automatically based on Explainable AI using Grad-Cam and a 3DCNN model that extracts ROIs per patient. After extracting the ROIs automatically, Alzheimer's disease is predicted using extracted ROI-based 3D CNN. Results We implement our method on 176 MCI patients of the famous ADNI dataset and obtain remarkable results compared to the state-of-the-art methods. The accuracy acquired using 5-fold cross-validation is 98.6 and the AUC is 1. We also compare the results of the ROI-based method with the whole brain-based method. The results show that the performance is impressively increased. Conclusion The experimental results show that the proposed smart ROI extraction, which extracts the ROIs automatically, performs well for Alzheimer's disease prediction. The proposed method can also be used for Alzheimer's disease classification and diagnosis.
Abstract:Recently, the attention-enriched encoder-decoder framework has aroused great interest in image captioning due to its overwhelming progress. Many visual attention models directly leverage meaningful regions to generate image descriptions. However, seeking a direct transition from visual space to text is not enough to generate fine-grained captions. This paper exploits a feature-compounding approach to bring together high-level semantic concepts and visual information regarding the contextual environment fully end-to-end. Thus, we propose a stacked cross-modal feature consolidation (SCFC) attention network for image captioning in which we simultaneously consolidate cross-modal features through a novel compounding function in a multi-step reasoning fashion. Besides, we jointly employ spatial information and context-aware attributes (CAA) as the principal components in our proposed compounding function, where our CAA provides a concise context-sensitive semantic representation. To make better use of consolidated features potential, we further propose an SCFC-LSTM as the caption generator, which can leverage discriminative semantic information through the caption generation process. The experimental results indicate that our proposed SCFC can outperform various state-of-the-art image captioning benchmarks in terms of popular metrics on the MSCOCO and Flickr30K datasets.
Abstract:Cardiovascular magnetic resonance (CMR) imaging has become a modality with superior power for the diagnosis and prognosis of cardiovascular diseases. One of the essential basic quality controls of CMR images is to investigate the complete cardiac coverage, which is necessary for the volumetric and functional assessment. This study examines the full cardiac coverage using a 3D convolutional model and then reduces the number of false predictions using an innovative salient region detection model. Salient regions are extracted from the short-axis cine CMR stacks using a three-step proposed algorithm. Combining the 3D CNN baseline model with the proposed salient region detection model provides a cascade detector that can reduce the number of false negatives of the baseline model. The results obtained on the images of over 6,200 participants of the UK Biobank population cohort study show the superiority of the proposed model over the previous state-of-the-art studies. The dataset is the largest regarding the number of participants to control the cardiac coverage. The accuracy of the baseline model in identifying the presence/absence of basal/apical slices is 96.25\% and 94.51\%, respectively, which increases to 96.88\% and 95.72\% after improving using the proposed salient region detection model. Using the salient region detection model by forcing the baseline model to focus on the most informative areas of the images can help the model correct misclassified samples' predictions. The proposed fully automated model's performance indicates that this model can be used in image quality control in population cohort datasets and also real-time post-imaging quality assessments.
Abstract:Intra retinal fluids or Cysts are one of the important symptoms of macular pathologies that are efficiently visualized in OCT images. Automatic segmentation of these abnormalities has been widely investigated in medical image processing studies. In this paper, we propose a new U-Net-based approach for Intra retinal cyst segmentation across different vendors that improves some of the challenges faced by previous deep-based techniques. The proposed method has two main steps: 1- prior information embedding and input data adjustment, and 2- IRC segmentation model. In the first step, we inject the information into the network in a way that overcomes some of the network limitations in receiving data and learning important contextual knowledge. And in the next step, we introduced a connection module between encoder and decoder parts of the standard U-Net architecture that transfers information more effectively from the encoder to the decoder part. Two public datasets namely OPTIMA and KERMANY were employed to evaluate the proposed method. Results showed that the proposed method is an efficient vendor-independent approach for IRC segmentation with mean Dice values of 0.78 and 0.81 on the OPTIMA and KERMANY datasets, respectively.
Abstract:Population imaging studies rely upon good quality medical imagery before downstream image quantification. This study provides an automated approach to assess image quality from cardiovascular magnetic resonance (CMR) imaging at scale. We identify four common CMR imaging artefacts, including respiratory motion, cardiac motion, Gibbs ringing, and aliasing. The model can deal with images acquired in different views, including two, three, and four-chamber long-axis and short-axis cine CMR images. Two deep learning-based models in spatial and frequency domains are proposed. Besides recognising these artefacts, the proposed models are suitable to the common challenges of not having access to data labels. An unsupervised domain adaptation method and a Fourier-based convolutional neural network are proposed to overcome these challenges. We show that the proposed models reliably allow for CMR image quality assessment. The accuracies obtained for the spatial model in supervised and weakly supervised learning are 99.41+0.24 and 96.37+0.66 for the UK Biobank dataset, respectively. Using unsupervised domain adaptation can somewhat overcome the challenge of not having access to the data labels. The maximum achieved domain gap coverage in unsupervised domain adaptation is 16.86%. Domain adaptation can significantly improve a 5-class classification task and deal with considerable domain shift without data labels. Increasing the speed of training and testing can be achieved with the proposed model in the frequency domain. The frequency-domain model can achieve the same accuracy yet 1.548 times faster than the spatial model. This model can also be used directly on k-space data, and there is no need for image reconstruction.