Foundation models have transformed medical image analysis by providing robust feature representations that reduce the need for large-scale task-specific training. However, current benchmarks in dermatology often reduce the complex diagnostic taxonomy to flat, binary classification tasks, such as distinguishing melanoma from benign nevi. This oversimplification obscures a model's ability to perform fine-grained differential diagnoses, which is critical for clinical workflow integration. This study evaluates the utility of embeddings derived from ten foundation models, spanning general computer vision, general medical imaging, and dermatology-specific domains, for hierarchical skin lesion classification. Using the DERM12345 dataset, which comprises 40 lesion subclasses, we calculated frozen embeddings and trained lightweight adapter models using a five-fold cross-validation. We introduce a hierarchical evaluation framework that assesses performance across four levels of clinical granularity: 40 Subclasses, 15 Main Classes, 2 and 4 Superclasses, and Binary Malignancy. Our results reveal a "granularity gap" in model capabilities: MedImageInsights achieved the strongest overall performance (97.52% weighted F1-Score on Binary Malignancy detection) but declined to 65.50% on fine-grained 40-class subtype classification. Conversely, MedSigLip (69.79%) and dermatology-specific models (Derm Foundation and MONET) excelled at fine-grained 40-class subtype discrimination while achieving lower overall performance than MedImageInsights on broader classification tasks. Our findings suggest that while general medical foundation models are highly effective for high-level screening, specialized modeling strategies are necessary for the granular distinctions required in diagnostic support systems.
Early diagnosis of melanoma, which can save thousands of lives, relies heavily on the analysis of dermoscopic images. One crucial diagnostic criterion is the identification of unusual pigment network (PN). However, distinguishing between regular (typical) and irregular (atypical) PN is challenging. This study aims to automate the PN detection process using a directional imaging algorithm and classify PN types using machine learning classifiers. The directional imaging algorithm incorporates Principal Component Analysis (PCA), contrast enhancement, filtering, and noise reduction. Applied to the PH2 dataset, this algorithm achieved a 96% success rate, which increased to 100% after pixel intensity adjustments. We created a new dataset containing only PN images from these results. We then employed two classifiers, Convolutional Neural Network (CNN) and Bag of Features (BoF), to categorize PN into atypical and typical classes. Given the limited dataset of 200 images, a simple and effective CNN was designed, featuring two convolutional layers and two batch normalization layers. The proposed CNN achieved 90% accuracy, 90% sensitivity, and 89% specificity. When compared to state-of-the-art methods, our CNN demonstrated superior performance. Our study highlights the potential of the proposed CNN model for effective PN classification, suggesting future research should focus on expanding datasets and incorporating additional dermatological features to further enhance melanoma diagnosis.
The color of skin lesions is an important diagnostic feature for identifying malignant melanoma and other skin diseases. Typical colors associated with melanocytic lesions include tan, brown, black, red, white, and blue gray. This study introduces a novel feature: the number of colors present in a lesion, which can indicate the severity of disease and help distinguish melanomas from benign lesions. We propose a color histogram analysis method to examine lesion pixel values from three publicly available datasets: PH2, ISIC2016, and Med Node. The PH2 dataset contains ground truth annotations of lesion colors, while ISIC2016 and Med Node do not; our algorithm estimates the ground truth using color histogram analysis based on PH2. We then design and train a 19 layer Convolutional Neural Network (CNN) with residual skip connections to classify lesions into three categories based on the number of colors present. DeepDream visualization is used to interpret features learned by the network, and multiple CNN configurations are tested. The best model achieves a weighted F1 score of 75 percent. LIME is applied to identify important regions influencing model decisions. The results show that the number of colors in a lesion is a significant feature for describing skin conditions, and the proposed CNN with three skip connections demonstrates strong potential for clinical diagnostic support.
Melanoma is the most lethal subtype of skin cancer, and early and accurate detection of this disease can greatly improve patients' outcomes. Although machine learning models, especially convolutional neural networks (CNNs), have shown great potential in automating melanoma classification, their diagnostic reliability still suffers due to inconsistent focus on lesion areas. In this study, we analyze the relationship between lesion attention and diagnostic performance, involving masked images, bounding box detection, and transfer learning. We used multiple explainability and sensitivity analysis approaches to investigate how well models aligned their attention with lesion areas and how this alignment correlated with precision, recall, and F1-score. Results showed that models with a higher focus on lesion areas achieved better diagnostic performance, suggesting the potential of interpretable AI in medical diagnostics. This study provides a foundation for developing more accurate and trustworthy melanoma classification models in the future.
Skin cancer is also one of the most common and dangerous types of cancer in the world that requires timely and precise diagnosis. In this paper, a deep-learning architecture of the multi-class skin lesion classification on the HAM10000 dataset will be described. The system suggested combines high-quality data balancing methods, large-scale data augmentation, hybridized EfficientNetV2-L framework with channel attention, and a three-stage progressive learning approach. Moreover, we also use explainable AI (XAI) techniques such as Grad-CAM and saliency maps to come up with intelligible visual representations of model predictions. Our strategy is with a total accuracy of 91.15 per cent, macro F1 of 85.45\% and micro-average AUC of 99.33\%. The model has shown high performance in all the seven lesion classes with specific high performance of melanoma and melanocytic nevi. In addition to enhancing diagnostic transparency, XAI also helps to find out the visual characteristics that cause the classifications, which enhances clinical trustworthiness.
Accurate and early diagnosis of malignant melanoma is critical for improving patient outcomes. While convolutional neural networks (CNNs) have shown promise in dermoscopic image analysis, they often neglect clinical metadata and require extensive preprocessing. Vision-language models (VLMs) offer a multimodal alternative but struggle to capture clinical specificity when trained on general-domain data. To address this, we propose a retrieval-augmented VLM framework that incorporates semantically similar patient cases into the diagnostic prompt. Our method enables informed predictions without fine-tuning and significantly improves classification accuracy and error correction over conventional baselines. These results demonstrate that retrieval-augmented prompting provides a robust strategy for clinical decision support.
Spitz tumors are diagnostically challenging due to overlap in atypical histological features with conventional melanomas. We investigated to what extent AI models, using histological and/or clinical features, can: (1) distinguish Spitz tumors from conventional melanomas; (2) predict the underlying genetic aberration of Spitz tumors; and (3) predict the diagnostic category of Spitz tumors. The AI models were developed and validated using a dataset of 393 Spitz tumors and 379 conventional melanomas. Predictive performance was measured using the AUROC and the accuracy. The performance of the AI models was compared with that of four experienced pathologists in a reader study. Moreover, a simulation experiment was conducted to investigate the impact of implementing AI-based recommendations for ancillary diagnostic testing on the workflow of the pathology department. The best AI model based on UNI features reached an AUROC of 0.95 and an accuracy of 0.86 in differentiating Spitz tumors from conventional melanomas. The genetic aberration was predicted with an accuracy of 0.55 compared to 0.25 for randomly guessing. The diagnostic category was predicted with an accuracy of 0.51, where random chance-level accuracy equaled 0.33. On all three tasks, the AI models performed better than the four pathologists, although differences were not statistically significant for most individual comparisons. Based on the simulation experiment, implementing AI-based recommendations for ancillary diagnostic testing could reduce material costs, turnaround times, and examinations. In conclusion, the AI models achieved a strong predictive performance in distinguishing between Spitz tumors and conventional melanomas. On the more challenging tasks of predicting the genetic aberration and the diagnostic category of Spitz tumors, the AI models performed better than random chance.
Melanoma, one of the deadliest types of skin cancer, accounts for thousands of fatalities globally. The bluish, blue-whitish, or blue-white veil (BWV) is a critical feature for diagnosing melanoma, yet research into detecting BWV in dermatological images is limited. This study utilizes a non-annotated skin lesion dataset, which is converted into an annotated dataset using a proposed imaging algorithm based on color threshold techniques on lesion patches and color palettes. A Deep Convolutional Neural Network (DCNN) is designed and trained separately on three individual and combined dermoscopic datasets, using custom layers instead of standard activation function layers. The model is developed to categorize skin lesions based on the presence of BWV. The proposed DCNN demonstrates superior performance compared to conventional BWV detection models across different datasets. The model achieves a testing accuracy of 85.71% on the augmented PH2 dataset, 95.00% on the augmented ISIC archive dataset, 95.05% on the combined augmented (PH2+ISIC archive) dataset, and 90.00% on the Derm7pt dataset. An explainable artificial intelligence (XAI) algorithm is subsequently applied to interpret the DCNN's decision-making process regarding BWV detection. The proposed approach, coupled with XAI, significantly improves the detection of BWV in skin lesions, outperforming existing models and providing a robust tool for early melanoma diagnosis.
In dermoscopic images, which allow visualization of surface skin structures not visible to the naked eye, lesion shape offers vital insights into skin diseases. In clinically practiced methods, asymmetric lesion shape is one of the criteria for diagnosing melanoma. Initially, we labeled data for a non-annotated dataset with symmetrical information based on clinical assessments. Subsequently, we propose a supporting technique, a supervised learning image processing algorithm, to analyze the geometrical pattern of lesion shape, aiding non-experts in understanding the criteria of an asymmetric lesion. We then utilize a pre-trained convolutional neural network (CNN) to extract shape, color, and texture features from dermoscopic images for training a multiclass support vector machine (SVM) classifier, outperforming state-of-the-art methods from the literature. In the geometry-based experiment, we achieved a 99.00% detection rate for dermatological asymmetric lesions. In the CNN-based experiment, the best performance is found with 94% Kappa Score, 95% Macro F1-score, and 97% Weighted F1-score for classifying lesion shapes (Asymmetric, Half-Symmetric, and Symmetric).




Pigmented skin lesions represent localized areas of increased melanin and can indicate serious conditions like melanoma, a major contributor to skin cancer mortality. The MedMNIST v2 dataset, inspired by MNIST, was recently introduced to advance research in biomedical imaging and includes DermaMNIST, a dataset for classifying pigmented lesions based on the HAM10000 dataset. This study assesses ResNet-50 and EfficientNetV2L models for multi-class classification using DermaMNIST, employing transfer learning and various layer configurations. One configuration achieves results that match or surpass existing methods. This study suggests that convolutional neural networks (CNNs) can drive progress in biomedical image analysis, significantly enhancing diagnostic accuracy.