Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Pouly

Towards Scalable Foundation Models for Digital Dermatology

Nov 08, 2024

Fabian Gröger, Philippe Gottfrois, Ludovic Amruthalingam, Alvaro Gonzalez-Jimenez, Simone Lionetti, Luis R. Soenksen-Martinez, Alexander A. Navarini, Marc Pouly

Abstract:The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.

* Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 11 pages

Via

Access Paper or Ask Questions

PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa

Nov 07, 2024

Philippe Gottfrois, Fabian Gröger, Faly Herizo Andriambololoniaina, Ludovic Amruthalingam, Alvaro Gonzalez-Jimenez, Christophe Hsu, Agnes Kessy, Simone Lionetti, Daudi Mavura, Wingston Ng'ambi(+6 more)

Abstract:Africa faces a huge shortage of dermatologists, with less than one per million people. This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions. The integration of AI into healthcare sparks significant hope for treatment accessibility, especially through the development of AI-supported teledermatology. Current AI models are predominantly trained on white-skinned patients and do not generalize well enough to pigmented patients. The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data. This dataset is the first of its kind, consisting of 1,653 patients for a total of 4,901 images. The images are representative of telemedicine settings and encompass the most common paediatric conditions: eczema, fungals, scabies, and impetigo. We also provide a baseline machine learning model trained on the dataset and a detailed performance analysis for the subpopulations represented in the dataset. The project website can be found at https://passionderm.github.io/.

* Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024, MICCAI 2024, Lecture Notes in Computer Science, vol. 15003, Springer, Cham
* MICCAI 2024

Via

Access Paper or Ask Questions

Hyperbolic Metric Learning for Visual Outlier Detection

Mar 22, 2024

Alvaro Gonzalez-Jimenez, Simone Lionetti, Dena Bazazian, Philippe Gottfrois, Fabian Gröger, Marc Pouly, Alexander Navarini

Figure 1 for Hyperbolic Metric Learning for Visual Outlier Detection

Figure 2 for Hyperbolic Metric Learning for Visual Outlier Detection

Figure 3 for Hyperbolic Metric Learning for Visual Outlier Detection

Figure 4 for Hyperbolic Metric Learning for Visual Outlier Detection

Abstract:Out-Of-Distribution (OOD) detection is critical to deploy deep learning models in safety-critical applications. However, the inherent hierarchical concept structure of visual data, which is instrumental to OOD detection, is often poorly captured by conventional methods based on Euclidean geometry. This work proposes a metric framework that leverages the strengths of Hyperbolic geometry for OOD detection. Inspired by previous works that refine the decision boundary for OOD data with synthetic outliers, we extend this method to Hyperbolic space. Interestingly, we find that synthetic outliers do not benefit OOD detection in Hyperbolic space as they do in Euclidean space. Furthermore we explore the relationship between OOD detection performance and Hyperbolic embedding dimension, addressing practical concerns in resource-constrained environments. Extensive experiments show that our framework improves the FPR95 for OOD detection from 22\% to 15\% and from 49% to 28% on CIFAR-10 and CIFAR-100 respectively compared to Euclidean methods.

Via

Access Paper or Ask Questions

Estimating Text Similarity based on Semantic Concept Embeddings

Jan 09, 2024

Tim vor der Brück, Marc Pouly

Abstract:Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.

* IARIA Congress Proceedings, 2023

Via

Access Paper or Ask Questions

Towards Reliable Dermatology Evaluation Benchmarks

Sep 13, 2023

Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Matthew Groh, Roxana Daneshjou, Labelling Consortium, Alexander A. Navarini, Marc Pouly

Figure 1 for Towards Reliable Dermatology Evaluation Benchmarks

Figure 2 for Towards Reliable Dermatology Evaluation Benchmarks

Figure 3 for Towards Reliable Dermatology Evaluation Benchmarks

Figure 4 for Towards Reliable Dermatology Evaluation Benchmarks

Abstract:Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.

* Link to the revised file lists: https://github.com/Digital-Dermatology/SelfClean-Revised-Benchmarks

Via

Access Paper or Ask Questions

Assessing Guest Nationality Composition from Hotel Reviews

Aug 11, 2023

Fabian Gröger, Marc Pouly, Flavia Tinner, Leif Brandes

Abstract:Many hotels target guest acquisition efforts to specific markets in order to best anticipate individual preferences and needs of their guests. Likewise, such strategic positioning is a prerequisite for efficient marketing budget allocation. Official statistics report on the number of visitors from different countries, but no fine-grained information on the guest composition of individual businesses exists. There is, however, growing interest in such data from competitors, suppliers, researchers and the general public. We demonstrate how machine learning can be leveraged to extract references to guest nationalities from unstructured text reviews in order to dynamically assess and monitor the dynamics of guest composition of individual businesses. In particular, we show that a rather simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models.

* Gr\"oger, Fabian; Pouly, Marc; Tinner, Flavia & Brandes, Leif (2022). Assessing Guest Nationality Composition from Hotel Reviews. Proceedings of the 9th Swiss Data Science Conference, 1

Via

Access Paper or Ask Questions

Robust T-Loss for Medical Image Segmentation

Jun 01, 2023

Alvaro Gonzalez-Jimenez, Simone Lionetti, Philippe Gottfrois, Fabian Gröger, Marc Pouly, Alexander Navarini

Abstract:This paper presents a new robust loss function, the T-Loss, for medical image segmentation. The proposed loss is based on the negative log-likelihood of the Student-t distribution and can effectively handle outliers in the data by controlling its sensitivity with a single parameter. This parameter is updated during the backpropagation process, eliminating the need for additional computation or prior information about the level and spread of noisy labels. Our experiments show that the T-Loss outperforms traditional loss functions in terms of dice scores on two public medical datasets for skin lesion and lung segmentation. We also demonstrate the ability of T-Loss to handle different types of simulated label noise, resembling human error. Our results provide strong evidence that the T-Loss is a promising alternative for medical image segmentation where high levels of noise or outliers in the dataset are a typical phenomenon in practice. The project website can be found at https://robust-tloss.github.io

* Early accepted to MICCAI 2023

Via

Access Paper or Ask Questions

SelfClean: A Self-Supervised Data Cleaning Strategy

May 26, 2023

Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Labelling Consortium, Matthew Groh, Alexander A. Navarini, Marc Pouly

Figure 1 for SelfClean: A Self-Supervised Data Cleaning Strategy

Figure 2 for SelfClean: A Self-Supervised Data Cleaning Strategy

Figure 3 for SelfClean: A Self-Supervised Data Cleaning Strategy

Figure 4 for SelfClean: A Self-Supervised Data Cleaning Strategy

Abstract:Most commonly used benchmark datasets for computer vision contain irrelevant images, near duplicates, and label errors. Consequently, model performance on these benchmarks may not be an accurate estimate of generalization ability. This is a particularly acute concern in computer vision for medicine where datasets are typically small, stakes are high, and annotation processes are expensive and error-prone. In this paper, we propose SelfClean, a general procedure to clean up image datasets exploiting a latent space learned with self-supervision. By relying on self-supervised learning, our approach focuses on intrinsic properties of the data and avoids annotation biases. We formulate dataset cleaning as either a set of ranking problems, where human experts can make decisions with significantly reduced effort, or a set of scoring problems, where decisions can be fully automated based on score distributions. We compare SelfClean against other algorithms on common computer vision benchmarks enhanced with synthetic noise and demonstrate state-of-the-art performance on detecting irrelevant images, near duplicates, and label errors. In addition, we apply our method to multiple image datasets and confirm an improvement in evaluation reliability.

Via

Access Paper or Ask Questions

Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Nov 25, 2020

Jana Koehler, Joseph Bürgler, Urs Fontana, Etienne Fux, Florian Herzog, Marc Pouly, Sophia Saller, Anastasia Salyaeva, Peter Scheiblechner, Kai Waelti

Figure 1 for Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Figure 2 for Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Figure 3 for Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Figure 4 for Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Abstract:Cable trees are used in industrial products to transmit energy and information between different product parts. To this date, they are mostly assembled by humans and only few automated manufacturing solutions exist using complex robotic machines. For these machines, the wiring plan has to be translated into a wiring sequence of cable plugging operations to be followed by the machine. In this paper, we study and formalize the problem of deriving the optimal wiring sequence for a given layout of a cable tree. We summarize our investigations to model this cable tree wiring Problem (CTW) as a traveling salesman problem with atomic, soft atomic, and disjunctive precedence constraints as well as tour-dependent edge costs such that it can be solved by state-of-the-art constraint programming (CP), Optimization Modulo Theories (OMT), and mixed-integer programming (MIP) solvers. It is further shown, how the CTW problem can be viewed as a soft version of the coupled tasks scheduling problem. We discuss various modeling variants for the problem, prove its NP-hardness, and empirically compare CP, OMT, and MIP solvers on a benchmark set of 278 instances. The complete benchmark set with all models and instance data is available on github and is accepted for inclusion in the MiniZinc challenge 2020.

Via

Access Paper or Ask Questions

Sufficient and necessary conditions for Dynamic Programming in Valuation-Based Systems

Aug 14, 2015

Jordi Roca-Lacostena, Jesus Cerquides, Marc Pouly

Figure 1 for Sufficient and necessary conditions for Dynamic Programming in Valuation-Based Systems

Figure 2 for Sufficient and necessary conditions for Dynamic Programming in Valuation-Based Systems

Figure 3 for Sufficient and necessary conditions for Dynamic Programming in Valuation-Based Systems

Figure 4 for Sufficient and necessary conditions for Dynamic Programming in Valuation-Based Systems

Abstract:Valuation algebras abstract a large number of formalisms for automated reasoning and enable the definition of generic inference procedures. Many of these formalisms provide some notion of solution. Typical examples are satisfying assignments in constraint systems, models in logics or solutions to linear equation systems. Many widely used dynamic programming algorithms for optimization problems rely on low treewidth decompositions and can be understood as particular cases of a single algorithmic scheme for finding solutions in a valuation algebra. The most encompassing description of this algorithmic scheme to date has been proposed by Pouly and Kohlas together with sufficient conditions for its correctness. Unfortunately, the formalization relies on a theorem for which we provide counterexamples. In spite of that, the mainline of Pouly and Kohlas' theory is correct, although some of the necessary conditions have to be revised. In this paper we analyze the impact that the counter-examples have on the theory, and rebuild the theory providing correct sufficient conditions for the algorithms. Furthermore, we also provide necessary conditions for the algorithms, allowing for a sharper characterization of when the algorithmic scheme can be applied.

Via

Access Paper or Ask Questions