Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sonnet Xu

Visual concept ranking uncovers medical shortcuts used by large multimodal models

Feb 04, 2026

Joseph D. Janizek, Sonnet Xu, Junayd Lateef, Roxana Daneshjou

Abstract:Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. We introduce a method for identifying important visual concepts within large multimodal models (LMMs) and use it to investigate the behaviors these models exhibit when prompted with medical tasks. We primarily focus on the task of classifying malignant skin lesions from clinical dermatology images, with supplemental experiments including both chest radiographs and natural images. After showing how LMMs display unexpected gaps in performance between different demographic subgroups when prompted with demonstrating examples, we apply our method, Visual Concept Ranking (VCR), to these models and prompts. VCR generates hypotheses related to different visual feature dependencies, which we are then able to validate with manual interventions.

Via

Access Paper or Ask Questions

BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

Mar 04, 2025

Sonnet Xu, Joseph Janizek, Yixing Jiang, Roxana Daneshjou

Figure 1 for BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

Figure 2 for BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

Figure 3 for BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

Figure 4 for BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

Abstract:Vision language models (VLMs) show promise in medical diagnosis, but their performance across demographic subgroups when using in-context learning (ICL) remains poorly understood. We examine how the demographic composition of demonstration examples affects VLM performance in two medical imaging tasks: skin lesion malignancy prediction and pneumothorax detection from chest radiographs. Our analysis reveals that ICL influences model predictions through multiple mechanisms: (1) ICL allows VLMs to learn subgroup-specific disease base rates from prompts and (2) ICL leads VLMs to make predictions that perform differently across demographic groups, even after controlling for subgroup-specific disease base rates. Our empirical results inform best-practices for prompting current VLMs (specifically examining demographic subgroup performance, and matching base rates of labels to target distribution at a bulk level and within subgroups), while also suggesting next steps for improving our theoretical understanding of these models.

Via

Access Paper or Ask Questions