Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cristiano Patrício

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Jan 21, 2025

Cristiano Patrício, Isabel Rio-Torto, Jaime S. Cardoso, Luís F. Teixeira, João C. Neves

Figure 1 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 2 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 3 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 4 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Abstract:The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis

Nov 08, 2024

Cristiano Patrício, Luís F. Teixeira, João C. Neves

Abstract:The main challenges hindering the adoption of deep learning-based systems in clinical settings are the scarcity of annotated data and the lack of interpretability and trust in these systems. Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts. However, this inherent interpretability comes at the cost of greater annotation burden. Additionally, adding new concepts requires retraining the entire system. In this work, we introduce a novel two-step methodology that addresses both of these challenges. By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and a Large Language Model (LLM) to generate disease diagnoses based on the predicted concepts. We validate our approach on three skin lesion datasets, demonstrating that it outperforms traditional CBMs and state-of-the-art explainable methods, all without requiring any training and utilizing only a few annotated examples. The code is available at https://github.com/CristianoPatricio/2-step-concept-based-skin-diagnosis.

* Preprint submitted for review

Via

Access Paper or Ask Questions

Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Jun 04, 2024

Cristiano Patrício, Carlo Alberto Barbano, Attilio Fiandrotti, Riccardo Renzulli, Marco Grangetto, Luis F. Teixeira, João C. Neves

Figure 1 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 2 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 3 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 4 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Abstract:Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. However, the dependency on target (unhealthy) samples can be challenging in medical scenarios due to their limited availability. Also, the blurred reconstructions of VAEs lack utility and interpretability. In this work, we redefine the CA task by employing a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques. Subsequently, we exploit state-of-the-art generative methods, i.e. diffusion models, conditioned on the learned latent representation to produce a realistic (healthy) version of the input image encoding solely the common patterns. Thorough validation on a facial image dataset and experiments across three brain MRI datasets demonstrate that conditioning the generative process of state-of-the-art generative methods with the latent representation from our self-supervised contrastive encoder yields improvements in the generated image quality and in the accuracy of image classification. The code is available at https://github.com/CristianoPatricio/unsupervised-contrastive-cond-diff.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models

Nov 24, 2023

Cristiano Patrício, Luís F. Teixeira, João C. Neves

Abstract:Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annotation process. In this work, we show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings. Our experiments reveal that vision-language models not only attain better accuracy when using concepts as textual embeddings, but also require a smaller number of concept-annotated samples to attain comparable performance to approaches specifically devised for automatic concept generation.

* 5 pages

Via

Access Paper or Ask Questions

Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Apr 17, 2023

Cristiano Patrício, João C. Neves, Luís F. Teixeira

Figure 1 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 2 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 3 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 4 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Abstract:Early detection of melanoma is crucial for preventing severe complications and increasing the chances of successful treatment. Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models, as they omit the rationale behind the model prediction, compromising the trustworthiness and acceptability of these diagnostic methods. Attempts to provide concept-based explanations are based on post-hoc approaches, which depend on an additional model to derive interpretations. In this paper, we propose an inherently interpretable framework to improve the interpretability of concept-based models by incorporating a hard attention mechanism and a coherence loss term to assure the visual coherence of concept activations by the concept encoder, without requiring the supervision of additional annotations. The proposed framework explains its decision in terms of human-interpretable concepts and their respective contribution to the final prediction, as well as a visual interpretation of the locations where the concept is present in the image. Experiments on skin image datasets demonstrate that our method outperforms existing black-box and concept-based models for skin lesion classification.

* Under IEEE Copyright. Accepted for publication at CVPR 2023 Workshop Safe Artificial Intelligence for All Domains (SAIAD)

Via

Access Paper or Ask Questions

Explainable Deep Learning Methods in Medical Diagnosis: A Survey

May 10, 2022

Cristiano Patrício, João C. Neves, Luís F. Teixeira

Abstract:The remarkable success of deep learning has prompted interest in its application to medical diagnosis. Even tough state-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data, these models are hardly adopted in clinical workflows, mainly due to their lack of interpretability. The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models, leading to the creation of the topic of eXplainable Artificial Intelligence (XAI). In this context, we provide a thorough survey of XAI applied to medical diagnosis, including visual, textual, and example-based explanation methods. Moreover, this work reviews the existing medical imaging datasets and the existing metrics for evaluating the quality of the explanations . Complementary to most existing surveys, we include a performance comparison among a set of report generation-based methods. Finally, the major challenges in applying XAI to medical imaging are also discussed.

Via

Access Paper or Ask Questions

ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices

Oct 09, 2021

Cristiano Patrício, João Neves

Figure 1 for ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices

Figure 2 for ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices

Figure 3 for ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices

Figure 4 for ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices

Abstract:The recognition of unseen objects from a semantic representation or textual description, usually denoted as zero-shot learning, is more prone to be used in real-world scenarios when compared to traditional object recognition. Nevertheless, no work has evaluated the feasibility of deploying zero-shot learning approaches in these scenarios, particularly when using low-power devices. In this paper, we provide the first benchmark on the inference time of zero-shot learning, comprising an evaluation of state-of-the-art approaches regarding their speed/accuracy trade-off. An analysis to the processing time of the different phases of the ZSL inference stage reveals that visual feature extraction is the major bottleneck in this paradigm, but, we show that lightweight networks can dramatically reduce the overall inference time without reducing the accuracy obtained by the de facto ResNet101 architecture. Also, this benchmark evaluates how different ZSL approaches perform in low-power devices, and how the visual feature extraction phase could be optimized in this hardware. To foster the research and deployment of ZSL systems capable of operating in real-world scenarios, we release the evaluation framework used in this benchmark (https://github.com/CristianoPatricio/zsl-methods).

* 8 pages. Accepted at the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2021

Via

Access Paper or Ask Questions