Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stergios Christodoulidis

Full Conformal Adaptation of Medical Vision-Language Models

Jun 06, 2025

Julio Silva-Rodríguez, Leo Fillioux, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Ismail Ben Ayed, Jose Dolz

Abstract:Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, which theoretically guarantees a given error level on output sets by leveraging a labeled calibration set. However, the zero-shot performance of VLMs is inherently limited, and common practice involves few-shot transfer learning pipelines, which cannot absorb the rigid exchangeability assumptions of SCP. To alleviate this issue, we propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models, which operates transductively over each test data point using a few-shot adaptation set. Moreover, we complement this framework with SS-Text, a novel training-free linear probe solver for VLMs that alleviates the computational cost of such a transductive approach. We provide comprehensive experiments using 3 different modality-specialized medical VLMs and 9 adaptation tasks. Our framework requires exactly the same data as SCP, and provides consistent relative improvements of up to 27% on set efficiency while maintaining the same coverage guarantees.

* IPMI 2025. Code: https://github.com/jusiro/FCA

Via

Access Paper or Ask Questions

BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Dec 12, 2024

Pablo Morales-Álvarez, Stergios Christodoulidis, Maria Vakalopoulou, Pablo Piantanida, Jose Dolz

Figure 1 for BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Figure 2 for BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Figure 3 for BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Figure 4 for BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Abstract:The emergence of large pre-trained vision-language models (VLMs) represents a paradigm shift in machine learning, with unprecedented results in a broad span of visual recognition tasks. CLIP, one of the most popular VLMs, has exhibited remarkable zero-shot and transfer learning capabilities in classification. To transfer CLIP to downstream tasks, adapters constitute a parameter-efficient approach that avoids backpropagation through the large model (unlike related prompt learning methods). However, CLIP adapters have been developed to target discriminative performance, and the quality of their uncertainty estimates has been overlooked. In this work we show that the discriminative performance of state-of-the-art CLIP adapters does not always correlate with their uncertainty estimation capabilities, which are essential for a safe deployment in real-world scenarios. We also demonstrate that one of such adapters is obtained through MAP inference from a more general probabilistic framework. Based on this observation we introduce BayesAdapter, which leverages Bayesian inference to estimate a full probability distribution instead of a single point, better capturing the variability inherent in the parameter space. In a comprehensive empirical evaluation we show that our approach obtains high quality uncertainty estimates in the predictions, standing out in calibration and selective classification. Our code is publicly available at: https://github.com/pablomorales92/BayesAdapter.

* 30 pages, 5 figures, 23 tables

Via

Access Paper or Ask Questions

Are foundation models for computer vision good conformal predictors?

Dec 08, 2024

Leo Fillioux, Julio Silva-Rodríguez, Ismail Ben Ayed, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Jose Dolz

Figure 1 for Are foundation models for computer vision good conformal predictors?

Figure 2 for Are foundation models for computer vision good conformal predictors?

Figure 3 for Are foundation models for computer vision good conformal predictors?

Figure 4 for Are foundation models for computer vision good conformal predictors?

Abstract:Recent advances in self-supervision and constrastive learning have brought the performance of foundation models to unprecedented levels in a variety of tasks. Fueled by this progress, these models are becoming the prevailing approach for a wide array of real-world vision problems, including risk-sensitive and high-stakes applications. However, ensuring safe deployment in these scenarios requires a more comprehensive understanding of their uncertainty modeling capabilities, which has been barely explored. In this work, we delve into the behavior of vision and vision-language foundation models under Conformal Prediction (CP), a statistical framework that provides theoretical guarantees of marginal coverage of the true class. Across extensive experiments including popular vision classification benchmarks, well-known foundation vision models, and three CP methods, our findings reveal that foundation models are well-suited for conformalization procedures, particularly those integrating Vision Transformers. Furthermore, we show that calibrating the confidence predictions of these models leads to efficiency degradation of the conformal set on adaptive CP methods. In contrast, few-shot adaptation to downstream tasks generally enhances conformal scores, where we identify Adapters as a better conformable alternative compared to Prompt Learning strategies. Our empirical study identifies APS as particularly promising in the context of vision foundation models, as it does not violate the marginal coverage property across multiple challenging, yet realistic scenarios.

Via

Access Paper or Ask Questions

On the detection of Out-Of-Distribution samples in Multiple Instance Learning

Sep 11, 2023

Loïc Le Bescond, Maria Vakalopoulou, Stergios Christodoulidis, Fabrice André, Hugues Talbot

Figure 1 for On the detection of Out-Of-Distribution samples in Multiple Instance Learning

Figure 2 for On the detection of Out-Of-Distribution samples in Multiple Instance Learning

Abstract:The deployment of machine learning solutions in real-world scenarios often involves addressing the challenge of out-of-distribution (OOD) detection. While significant efforts have been devoted to OOD detection in classical supervised settings, the context of weakly supervised learning, particularly the Multiple Instance Learning (MIL) framework, remains under-explored. In this study, we tackle this challenge by adapting post-hoc OOD detection methods to the MIL setting while introducing a novel benchmark specifically designed to assess OOD detection performance in weakly supervised scenarios. Extensive experiments based on diverse public datasets do not reveal a single method with a clear advantage over the others. Although DICE emerges as the best-performing method overall, it exhibits significant shortcomings on some datasets, emphasizing the complexity of this under-explored and challenging topic. Our findings shed light on the complex nature of OOD detection under the MIL framework, emphasizing the importance of developing novel, robust, and reliable methods that can generalize effectively in a weakly supervised context. The code for the paper is available here: https://github.com/loic-lb/OOD_MIL.

Via

Access Paper or Ask Questions

Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy

Aug 28, 2023

Leo Fillioux, Emilie Gontran, Jérôme Cartry, Jacques RR Mathieu, Sabrina Bedja, Alice Boilève, Paul-Henry Cournède, Fanny Jaulin, Stergios Christodoulidis, Maria Vakalopoulou

Figure 1 for Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy

Figure 2 for Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy

Figure 3 for Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy

Figure 4 for Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy

Abstract:Over the last ten years, Patient-Derived Organoids (PDOs) emerged as the most reliable technology to generate ex-vivo tumor avatars. PDOs retain the main characteristics of their original tumor, making them a system of choice for pre-clinical and clinical studies. In particular, PDOs are attracting interest in the field of Functional Precision Medicine (FPM), which is based upon an ex-vivo drug test in which living tumor cells (such as PDOs) from a specific patient are exposed to a panel of anti-cancer drugs. Currently, the Adenosine Triphosphate (ATP) based cell viability assay is the gold standard test to assess the sensitivity of PDOs to drugs. The readout is measured at the end of the assay from a global PDO population and therefore does not capture single PDO responses and does not provide time resolution of drug effect. To this end, in this study, we explore for the first time the use of powerful large foundation models for the automatic processing of PDO data. In particular, we propose a novel imaging-based high-throughput screening method to assess real-time drug efficacy from a time-lapse microscopy video of PDOs. The recently proposed SAM algorithm for segmentation and DINOv2 model are adapted in a comprehensive pipeline for processing PDO microscopy frames. Moreover, an attention mechanism is proposed for fusing temporal and spatial features in a multiple instance learning setting to predict ATP. We report better results than other non-time-resolved methods, indicating that the temporality of data is an important factor for the prediction of ATP. Extensive ablations shed light on optimizing the experimental setting and automating the prediction both in real-time and for forecasting.

Via

Access Paper or Ask Questions

Structured State Space Models for Multiple Instance Learning in Digital Pathology

Jun 27, 2023

Leo Fillioux, Joseph Boyd, Maria Vakalopoulou, Paul-Henry Cournède, Stergios Christodoulidis

Figure 1 for Structured State Space Models for Multiple Instance Learning in Digital Pathology

Figure 2 for Structured State Space Models for Multiple Instance Learning in Digital Pathology

Figure 3 for Structured State Space Models for Multiple Instance Learning in Digital Pathology

Figure 4 for Structured State Space Models for Multiple Instance Learning in Digital Pathology

Abstract:Multiple instance learning is an ideal mode of analysis for histopathology data, where vast whole slide images are typically annotated with a single global label. In such cases, a whole slide image is modelled as a collection of tissue patches to be aggregated and classified. Common models for performing this classification include recurrent neural networks and transformers. Although powerful compression algorithms, such as deep pre-trained neural networks, are used to reduce the dimensionality of each patch, the sequences arising from whole slide images remain excessively long, routinely containing tens of thousands of patches. Structured state space models are an emerging alternative for sequence modelling, specifically designed for the efficient modelling of long sequences. These models invoke an optimal projection of an input sequence into memory units that compress the entire sequence. In this paper, we propose the use of state space models as a multiple instance learner to a variety of problems in digital pathology. Across experiments in metastasis detection, cancer subtyping, mutation classification, and multitask learning, we demonstrate the competitiveness of this new class of models with existing state of the art approaches. Our code is available at https://github.com/MICS-Lab/s4_digital_pathology.

Via

Access Paper or Ask Questions

Artifact Removal in Histopathology Images

Dec 16, 2022

Cameron Dahan, Stergios Christodoulidis, Maria Vakalopoulou, Joseph Boyd

Figure 1 for Artifact Removal in Histopathology Images

Figure 2 for Artifact Removal in Histopathology Images

Figure 3 for Artifact Removal in Histopathology Images

Figure 4 for Artifact Removal in Histopathology Images

Abstract:In the clinical setting of histopathology, whole-slide image (WSI) artifacts frequently arise, distorting regions of interest, and having a pernicious impact on WSI analysis. Image-to-image translation networks such as CycleGANs are in principle capable of learning an artifact removal function from unpaired data. However, we identify a surjection problem with artifact removal, and propose an weakly-supervised extension to CycleGAN to address this. We assemble a pan-cancer dataset comprising artifact and clean tiles from the TCGA database. Promising results highlight the soundness of our method.

* Corrected typos, small modification of Figure 1 (+ reflected in Section 2.1), results unchanged

Via

Access Paper or Ask Questions

Region-guided CycleGANs for Stain Transfer in Whole Slide Images

Aug 26, 2022

Joseph Boyd, Irène Villa, Marie-Christine Mathieu, Eric Deutsch, Nikos Paragios, Maria Vakalopoulou, Stergios Christodoulidis

Figure 1 for Region-guided CycleGANs for Stain Transfer in Whole Slide Images

Figure 2 for Region-guided CycleGANs for Stain Transfer in Whole Slide Images

Figure 3 for Region-guided CycleGANs for Stain Transfer in Whole Slide Images

Figure 4 for Region-guided CycleGANs for Stain Transfer in Whole Slide Images

Abstract:In whole slide imaging, commonly used staining techniques based on hematoxylin and eosin (H&E) and immunohistochemistry (IHC) stains accentuate different aspects of the tissue landscape. In the case of detecting metastases, IHC provides a distinct readout that is readily interpretable by pathologists. IHC, however, is a more expensive approach and not available at all medical centers. Virtually generating IHC images from H&E using deep neural networks thus becomes an attractive alternative. Deep generative models such as CycleGANs learn a semantically-consistent mapping between two image domains, while emulating the textural properties of each domain. They are therefore a suitable choice for stain transfer applications. However, they remain fully unsupervised, and possess no mechanism for enforcing biological consistency in stain transfer. In this paper, we propose an extension to CycleGANs in the form of a region of interest discriminator. This allows the CycleGAN to learn from unpaired datasets where, in addition, there is a partial annotation of objects for which one wishes to enforce consistency. We present a use case on whole slide images, where an IHC stain provides an experimentally generated signal for metastatic cells. We demonstrate the superiority of our approach over prior art in stain transfer on histopathology tiles over two datasets. Our code and model are available at https://github.com/jcboyd/miccai2022-roigan.

Via

Access Paper or Ask Questions

Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

Sep 07, 2021

Joseph Boyd, Mykola Liashuha, Eric Deutsch, Nikos Paragios, Stergios Christodoulidis, Maria Vakalopoulou

Figure 1 for Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

Figure 2 for Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

Figure 3 for Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

Figure 4 for Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

Abstract:The examination of histopathology images is considered to be the gold standard for the diagnosis and stratification of cancer patients. A key challenge in the analysis of such images is their size, which can run into the gigapixels and can require tedious screening by clinicians. With the recent advances in computational medicine, automatic tools have been proposed to assist clinicians in their everyday practice. Such tools typically process these large images by slicing them into tiles that can then be encoded and utilized for different clinical models. In this study, we propose a novel generative framework that can learn powerful representations for such tiles by learning to plausibly expand their visual field. In particular, we developed a progressively grown generative model with the objective of visual field expansion. Thus trained, our model learns to generate different tissue types with fine details, while simultaneously learning powerful representations that can be used for different clinical endpoints, all in a self-supervised way. To evaluate the performance of our model, we conducted classification experiments on CAMELYON17 and CRC benchmark datasets, comparing favorably to other self-supervised and pre-trained strategies that are commonly used in digital pathology. Our code is available at https://github.com/jcboyd/cdpath21-gan.

Via

Access Paper or Ask Questions

Exploring Deep Registration Latent Spaces

Jul 23, 2021

Théo Estienne, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Battistella, Théophraste Henry, Marvin Lerousseau, Amaury Leroy, Guillaume Chassagnon, Marie-Pierre Revel, Nikos Paragios(+1 more)

Figure 1 for Exploring Deep Registration Latent Spaces

Figure 2 for Exploring Deep Registration Latent Spaces

Figure 3 for Exploring Deep Registration Latent Spaces

Figure 4 for Exploring Deep Registration Latent Spaces

Abstract:Explainability of deep neural networks is one of the most challenging and interesting problems in the field. In this study, we investigate the topic focusing on the interpretability of deep learning-based registration methods. In particular, with the appropriate model architecture and using a simple linear projection, we decompose the encoding space, generating a new basis, and we empirically show that this basis captures various decomposed anatomically aware geometrical transformations. We perform experiments using two different datasets focusing on lungs and hippocampus MRI. We show that such an approach can decompose the highly convoluted latent spaces of registration pipelines in an orthogonal space with several interesting properties. We hope that this work could shed some light on a better understanding of deep learning-based registration methods.

* 13 pages, 5 figures + 3 figures in supplementary materials Accepted to DART 2021 workshop

Via

Access Paper or Ask Questions