Abstract:The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.
Abstract:Africa faces a huge shortage of dermatologists, with less than one per million people. This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions. The integration of AI into healthcare sparks significant hope for treatment accessibility, especially through the development of AI-supported teledermatology. Current AI models are predominantly trained on white-skinned patients and do not generalize well enough to pigmented patients. The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data. This dataset is the first of its kind, consisting of 1,653 patients for a total of 4,901 images. The images are representative of telemedicine settings and encompass the most common paediatric conditions: eczema, fungals, scabies, and impetigo. We also provide a baseline machine learning model trained on the dataset and a detailed performance analysis for the subpopulations represented in the dataset. The project website can be found at https://passionderm.github.io/.
Abstract:Real-time bioaerosol monitoring is improving the quality of life for people affected by allergies, but it often relies on deep-learning models which pose challenges for widespread adoption. These models are typically trained in a supervised fashion and require considerable effort to produce large amounts of annotated data, an effort that must be repeated for new particles, geographical regions, or measurement systems. In this work, we show that self-supervised learning and few-shot learning can be combined to classify holographic images of bioaerosol particles using a large collection of unlabelled data and only a few examples for each particle type. We first demonstrate that self-supervision on pictures of unidentified particles from ambient air measurements enhances identification even when labelled data is abundant. Most importantly, it greatly improves few-shot classification when only a handful of labelled images are available. Our findings suggest that real-time bioaerosol monitoring workflows can be substantially optimized, and the effort required to adapt models for different situations considerably reduced.
Abstract:Out-Of-Distribution (OOD) detection is critical to deploy deep learning models in safety-critical applications. However, the inherent hierarchical concept structure of visual data, which is instrumental to OOD detection, is often poorly captured by conventional methods based on Euclidean geometry. This work proposes a metric framework that leverages the strengths of Hyperbolic geometry for OOD detection. Inspired by previous works that refine the decision boundary for OOD data with synthetic outliers, we extend this method to Hyperbolic space. Interestingly, we find that synthetic outliers do not benefit OOD detection in Hyperbolic space as they do in Euclidean space. Furthermore we explore the relationship between OOD detection performance and Hyperbolic embedding dimension, addressing practical concerns in resource-constrained environments. Extensive experiments show that our framework improves the FPR95 for OOD detection from 22\% to 15\% and from 49% to 28% on CIFAR-10 and CIFAR-100 respectively compared to Euclidean methods.
Abstract:Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.
Abstract:Many hotels target guest acquisition efforts to specific markets in order to best anticipate individual preferences and needs of their guests. Likewise, such strategic positioning is a prerequisite for efficient marketing budget allocation. Official statistics report on the number of visitors from different countries, but no fine-grained information on the guest composition of individual businesses exists. There is, however, growing interest in such data from competitors, suppliers, researchers and the general public. We demonstrate how machine learning can be leveraged to extract references to guest nationalities from unstructured text reviews in order to dynamically assess and monitor the dynamics of guest composition of individual businesses. In particular, we show that a rather simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models.
Abstract:This paper presents a new robust loss function, the T-Loss, for medical image segmentation. The proposed loss is based on the negative log-likelihood of the Student-t distribution and can effectively handle outliers in the data by controlling its sensitivity with a single parameter. This parameter is updated during the backpropagation process, eliminating the need for additional computation or prior information about the level and spread of noisy labels. Our experiments show that the T-Loss outperforms traditional loss functions in terms of dice scores on two public medical datasets for skin lesion and lung segmentation. We also demonstrate the ability of T-Loss to handle different types of simulated label noise, resembling human error. Our results provide strong evidence that the T-Loss is a promising alternative for medical image segmentation where high levels of noise or outliers in the dataset are a typical phenomenon in practice. The project website can be found at https://robust-tloss.github.io
Abstract:Most commonly used benchmark datasets for computer vision contain irrelevant images, near duplicates, and label errors. Consequently, model performance on these benchmarks may not be an accurate estimate of generalization ability. This is a particularly acute concern in computer vision for medicine where datasets are typically small, stakes are high, and annotation processes are expensive and error-prone. In this paper, we propose SelfClean, a general procedure to clean up image datasets exploiting a latent space learned with self-supervision. By relying on self-supervised learning, our approach focuses on intrinsic properties of the data and avoids annotation biases. We formulate dataset cleaning as either a set of ranking problems, where human experts can make decisions with significantly reduced effort, or a set of scoring problems, where decisions can be fully automated based on score distributions. We compare SelfClean against other algorithms on common computer vision benchmarks enhanced with synthetic noise and demonstrate state-of-the-art performance on detecting irrelevant images, near duplicates, and label errors. In addition, we apply our method to multiple image datasets and confirm an improvement in evaluation reliability.
Abstract:We propose an unsupervised domain adaptation (UDA) approach for white matter hyperintensity (WMH) segmentation, which uses Self-Training with Uncertainty DEpendent Label refinement (STRUDEL). Self-training has recently been introduced as a highly effective method for UDA, which is based on self-generated pseudo labels. However, pseudo labels can be very noisy and therefore deteriorate model performance. We propose to predict the uncertainty of pseudo labels and integrate it in the training process with an uncertainty-guided loss function to highlight labels with high certainty. STRUDEL is further improved by incorporating the segmentation output of an existing method in the pseudo label generation that showed high robustness for WMH segmentation. In our experiments, we evaluate STRUDEL with a standard U-Net and a modified network with a higher receptive field. Our results on WMH segmentation across datasets demonstrate the significant improvement of STRUDEL with respect to standard self-training.
Abstract:The 2020 Multi-channel Magnetic Resonance Reconstruction (MC-MRRec) Challenge had two primary goals: 1) compare different MR image reconstruction models on a large dataset and 2) assess the generalizability of these models to datasets acquired with a different number of receiver coils (i.e., multiple channels). The challenge had two tracks: Track 01 focused on assessing models trained and tested with 12-channel data. Track 02 focused on assessing models trained with 12-channel data and tested on both 12-channel and 32-channel data. While the challenge is ongoing, here we describe the first edition of the challenge and summarise submissions received prior to 5 September 2020. Track 01 had five baseline models and received four independent submissions. Track 02 had two baseline models and received two independent submissions. This manuscript provides relevant comparative information on the current state-of-the-art of MR reconstruction and highlights the challenges of obtaining generalizable models that are required prior to clinical adoption. Both challenge tracks remain open and will provide an objective performance assessment for future submissions. Subsequent editions of the challenge are proposed to investigate new concepts and strategies, such as the integration of potentially available longitudinal information during the MR reconstruction process. An outline of the proposed second edition of the challenge is presented in this manuscript.