Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

João C. Neves

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Jan 21, 2025

Cristiano Patrício, Isabel Rio-Torto, Jaime S. Cardoso, Luís F. Teixeira, João C. Neves

Figure 1 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 2 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 3 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Figure 4 for CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Abstract:The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis

Nov 08, 2024

Cristiano Patrício, Luís F. Teixeira, João C. Neves

Abstract:The main challenges hindering the adoption of deep learning-based systems in clinical settings are the scarcity of annotated data and the lack of interpretability and trust in these systems. Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts. However, this inherent interpretability comes at the cost of greater annotation burden. Additionally, adding new concepts requires retraining the entire system. In this work, we introduce a novel two-step methodology that addresses both of these challenges. By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and a Large Language Model (LLM) to generate disease diagnoses based on the predicted concepts. We validate our approach on three skin lesion datasets, demonstrating that it outperforms traditional CBMs and state-of-the-art explainable methods, all without requiring any training and utilizing only a few annotated examples. The code is available at https://github.com/CristianoPatricio/2-step-concept-based-skin-diagnosis.

* Preprint submitted for review

Via

Access Paper or Ask Questions

Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

Aug 27, 2024

Marcelo dos Santos, Rayson Laroca, Rafael O. Ribeiro, João C. Neves, David Menotti

Figure 1 for Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

Figure 2 for Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

Figure 3 for Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

Figure 4 for Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

Abstract:Super-resolution algorithms often struggle with images from surveillance environments due to adverse conditions such as unknown degradation, variations in pose, irregular illumination, and occlusions. However, acquiring multiple images, even of low quality, is possible with surveillance cameras. In this work, we develop an algorithm based on diffusion models that utilize a low-resolution image combined with features extracted from multiple low-quality images to generate a super-resolved image while minimizing distortions in the individual's identity. Unlike other algorithms, our approach recovers facial features without explicitly providing attribute information or without the need to calculate a gradient of a function during the reconstruction process. To the best of our knowledge, this is the first time multi-features combined with low-resolution images are used as conditioners to generate more reliable super-resolution images using stochastic differential equations. The FFHQ dataset was employed for training, resulting in state-of-the-art performance in facial recognition and verification metrics when evaluated on the CelebA and Quis-Campi datasets. Our code is publicly available at https://github.com/marcelowds/fasr

* Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2024

Via

Access Paper or Ask Questions

Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Jun 04, 2024

Cristiano Patrício, Carlo Alberto Barbano, Attilio Fiandrotti, Riccardo Renzulli, Marco Grangetto, Luis F. Teixeira, João C. Neves

Figure 1 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 2 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 3 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Figure 4 for Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Abstract:Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. However, the dependency on target (unhealthy) samples can be challenging in medical scenarios due to their limited availability. Also, the blurred reconstructions of VAEs lack utility and interpretability. In this work, we redefine the CA task by employing a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques. Subsequently, we exploit state-of-the-art generative methods, i.e. diffusion models, conditioned on the learned latent representation to produce a realistic (healthy) version of the input image encoding solely the common patterns. Thorough validation on a facial image dataset and experiments across three brain MRI datasets demonstrate that conditioning the generative process of state-of-the-art generative methods with the latent representation from our self-supervised contrastive encoder yields improvements in the generated image quality and in the accuracy of image classification. The code is available at https://github.com/CristianoPatricio/unsupervised-contrastive-cond-diff.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models

Nov 24, 2023

Cristiano Patrício, Luís F. Teixeira, João C. Neves

Abstract:Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annotation process. In this work, we show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings. Our experiments reveal that vision-language models not only attain better accuracy when using concepts as textual embeddings, but also require a smaller number of concept-annotated samples to attain comparable performance to approaches specifically devised for automatic concept generation.

* 5 pages

Via

Access Paper or Ask Questions

Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Apr 17, 2023

Cristiano Patrício, João C. Neves, Luís F. Teixeira

Figure 1 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 2 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 3 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Figure 4 for Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

Abstract:Early detection of melanoma is crucial for preventing severe complications and increasing the chances of successful treatment. Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models, as they omit the rationale behind the model prediction, compromising the trustworthiness and acceptability of these diagnostic methods. Attempts to provide concept-based explanations are based on post-hoc approaches, which depend on an additional model to derive interpretations. In this paper, we propose an inherently interpretable framework to improve the interpretability of concept-based models by incorporating a hard attention mechanism and a coherence loss term to assure the visual coherence of concept activations by the concept encoder, without requiring the supervision of additional annotations. The proposed framework explains its decision in terms of human-interpretable concepts and their respective contribution to the final prediction, as well as a visual interpretation of the locations where the concept is present in the image. Experiments on skin image datasets demonstrate that our method outperforms existing black-box and concept-based models for skin lesion classification.

* Under IEEE Copyright. Accepted for publication at CVPR 2023 Workshop Safe Artificial Intelligence for All Domains (SAIAD)

Via

Access Paper or Ask Questions

Explainable Deep Learning Methods in Medical Diagnosis: A Survey

May 10, 2022

Cristiano Patrício, João C. Neves, Luís F. Teixeira

Abstract:The remarkable success of deep learning has prompted interest in its application to medical diagnosis. Even tough state-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data, these models are hardly adopted in clinical workflows, mainly due to their lack of interpretability. The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models, leading to the creation of the topic of eXplainable Artificial Intelligence (XAI). In this context, we provide a thorough survey of XAI applied to medical diagnosis, including visual, textual, and example-based explanation methods. Moreover, this work reviews the existing medical imaging datasets and the existing metrics for evaluating the quality of the explanations . Complementary to most existing surveys, we include a performance comparison among a set of report generation-based methods. Finally, the major challenges in applying XAI to medical imaging are also discussed.

Via

Access Paper or Ask Questions

Real or Fake? Spoofing State-Of-The-Art Face Synthesis Detection Systems

Nov 15, 2019

João C. Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo Proença

Figure 1 for Real or Fake? Spoofing State-Of-The-Art Face Synthesis Detection Systems

Figure 2 for Real or Fake? Spoofing State-Of-The-Art Face Synthesis Detection Systems

Figure 3 for Real or Fake? Spoofing State-Of-The-Art Face Synthesis Detection Systems

Figure 4 for Real or Fake? Spoofing State-Of-The-Art Face Synthesis Detection Systems

Abstract:The availability of large-scale facial databases, together with the remarkable progresses of deep learning technologies, in particular Generative Adversarial Networks (GANs), have led to the generation of extremely realistic fake facial content, which raises obvious concerns about the potential for misuse. These concerns have fostered the research of manipulation detection methods that, contrary to humans, have already achieved astonishing results in some scenarios. In this study, we focus on the entire face synthesis, which is one specific type of facial manipulation. The main contributions of this study are: i) a novel strategy to remove GAN "fingerprints" from synthetic fake images in order to spoof facial manipulation detection systems, while keeping the visual quality of the resulting images, ii) an in-depth analysis of state-of-the-art detection approaches for the entire face synthesis manipulation, iii) a complete experimental assessment of this type of facial manipulation considering state-of-the-art detection systems, remarking how challenging is this task in unconstrained scenarios, and finally iv) a novel public database named FSRemovalDB produced after applying our proposed GAN-fingerprint removal approach to original synthetic fake images. The observed results led us to conclude that more efforts are required to develop robust facial manipulation detection systems against unseen conditions and spoof techniques such as the one proposed in this study.

Via

Access Paper or Ask Questions