Abstract:The potential of deep neural networks in skin lesion classification has already been demonstrated to be on-par if not superior to the dermatologists diagnosis. However, the performance of these models usually deteriorates when the test data differs significantly from the training data (i.e. domain shift). This concerning limitation for models intended to be used in real-world skin lesion classification tasks poses a risk to patients. For example, different image acquisition systems or previously unseen anatomical sites on the patient can suffice to cause such domain shifts. Mitigating the negative effect of such shifts is therefore crucial, but developing effective methods to address domain shift has proven to be challenging. In this study, we carry out an in-depth analysis of eight different unsupervised domain adaptation methods to analyze their effectiveness in improving generalization for dermoscopic datasets. To ensure robustness of our findings, we test each method on a total of ten distinct datasets, thereby covering a variety of possible domain shifts. In addition, we investigated which factors in the domain shifted datasets have an impact on the effectiveness of domain adaptation methods. Our findings show that all of the eight domain adaptation methods result in improved AUPRC for the majority of analyzed datasets. Altogether, these results indicate that unsupervised domain adaptations generally lead to performance improvements for the binary melanoma-nevus classification task regardless of the nature of the domain shift. However, small or heavily imbalanced datasets lead to a reduced conformity of the results due to the influence of these factors on the methods performance.
Abstract:The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.