Abstract:Gene expression analysis is a critical method for cancer classification, enabling precise diagnoses through the identification of unique molecular signatures associated with various tumors. Identifying cancer-specific genes from gene expression values enables a more tailored and personalized treatment approach. However, the high dimensionality of mRNA gene expression data poses challenges for analysis and data extraction. This research presents a comprehensive pipeline designed to accurately identify 33 distinct cancer types and their corresponding gene sets. It incorporates a combination of normalization and feature selection techniques to reduce dataset dimensionality effectively while ensuring high performance. Notably, our pipeline successfully identifies a substantial number of cancer-specific genes using a reduced feature set of just 500, in contrast to using the full dataset comprising 19,238 features. By employing an ensemble approach that combines three top-performing classifiers, a classification accuracy of 96.61% was achieved. Furthermore, we leverage Explainable AI to elucidate the biological significance of the identified cancer-specific genes, employing Differential Gene Expression (DGE) analysis.
Abstract:Nuclei instance segmentation is crucial in oncological diagnosis and cancer pathology research. H&E stained images are commonly used for medical diagnosis, but pre-processing is necessary before using them for image processing tasks. Two principal pre-processing methods are formalin-fixed paraffin-embedded samples (FFPE) and frozen tissue samples (FS). While FFPE is widely used, it is time-consuming, while FS samples can be processed quickly. Analyzing H&E stained images derived from fast sample preparation, staining, and scanning can pose difficulties due to the swift process, which can result in the degradation of image quality. This paper proposes a method that leverages the unique optical characteristics of H&E stained images. A three-branch U-Net architecture has been implemented, where each branch contributes to the final segmentation results. The process includes applying watershed algorithm to separate overlapping regions and enhance accuracy. The Triple U-Net architecture comprises an RGB branch, a Hematoxylin branch, and a Segmentation branch. This study focuses on a novel dataset named CryoNuSeg. The results obtained through robust experiments outperform the state-of-the-art results across various metrics. The benchmark score for this dataset is AJI 52.5 and PQ 47.7, achieved through the implementation of U-Net Architecture. However, the proposed Triple U-Net architecture achieves an AJI score of 67.41 and PQ of 50.56. The proposed architecture improves more on AJI than other evaluation metrics, which further justifies the superiority of the Triple U-Net architecture over the baseline U-Net model, as AJI is a more strict evaluation metric. The use of the three-branch U-Net model, followed by watershed post-processing, significantly surpasses the benchmark scores, showing substantial improvement in the AJI score