Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Feb 29, 2024

Hao Li, Ying Chen, Yifei Chen, Wenxian Yang, Bowen Ding, Yuchen Han, Liansheng Wang, Rongshan Yu

Figure 1 for Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Figure 2 for Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Figure 3 for Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Figure 4 for Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Share this with someone who'll enjoy it:

Abstract:Whole Slide Image (WSI) classification is often formulated as a Multiple Instance Learning (MIL) problem. Recently, Vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However, existing methods leverage coarse-grained pathogenetic descriptions for visual representation supervision, which are insufficient to capture the complex visual appearance of pathogenetic images, hindering the generalizability of models on diverse downstream tasks. Additionally, processing high-resolution WSIs can be computationally expensive. In this paper, we propose a novel "Fine-grained Visual-Semantic Interaction" (FiVE) framework for WSI classification. It is designed to enhance the model's generalizability by leveraging the interplay between localized visual patterns and fine-grained pathological semantics. Specifically, with meticulously designed queries, we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module, we enable prompts to capture crucial visual information in WSIs, which enhances representation learning and augments generalization capabilities significantly. Furthermore, given that pathological visual patterns are redundantly distributed across tissue slices, we sample a subset of visual instances during training. Our method demonstrates robust generalizability and strong transferability, dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accuracy in few-shot experiments.

* Accepted by CVPR 2024

View paper on

Share this with someone who'll enjoy it:

Title:Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Paper and Code