Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meryem Altin Karagoz

Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Nov 19, 2024

Meryem Altin Karagoz, O. Ufuk Nalbantoglu, Geoffrey C. Fox

Figure 1 for Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Figure 2 for Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Figure 3 for Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Figure 4 for Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Abstract:Deep learning has proven very promising for interpreting MRI in brain tumor diagnosis. However, deep learning models suffer from a scarcity of brain MRI datasets for effective training. Self-supervised learning (SSL) models provide data-efficient and remarkable solutions to limited dataset problems. Therefore, this paper introduces a generative SSL model for brain tumor classification in two stages. The first stage is designed to pre-train a Residual Vision Transformer (ResViT) model for MRI synthesis as a pretext task. The second stage includes fine-tuning a ResViT-based classifier model as a downstream task. Accordingly, we aim to leverage local features via CNN and global features via ViT, employing a hybrid CNN-transformer architecture for ResViT in pretext and downstream tasks. Moreover, synthetic MRI images are utilized to balance the training set. The proposed model performs on public BraTs 2023, Figshare, and Kaggle datasets. Furthermore, we compare the proposed model with various deep learning models, including A-UNet, ResNet-9, pix2pix, pGAN for MRI synthesis, and ConvNeXtTiny, ResNet101, DenseNet12, Residual CNN, ViT for classification. According to the results, the proposed model pretraining on the MRI dataset is superior compared to the pretraining on the ImageNet dataset. Overall, the proposed model attains the highest accuracy, achieving 90.56% on the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the Kaggle brain tumor datasets. As a result, the proposed model demonstrates a robust, effective, and successful approach to handling insufficient dataset challenges in MRI analysis by incorporating SSL, fine-tuning, data augmentation, and combining CNN and ViT.

Via

Access Paper or Ask Questions