Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Oct 18, 2024

Honglin Li, Yunlong Zhang, Pingyi Chen, Zhongyi Shui, Chenglu Zhu, Lin Yang

Figure 1 for Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Figure 2 for Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Figure 3 for Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Figure 4 for Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Share this with someone who'll enjoy it:

Abstract:Histopathology Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. To develop computer-aided diagnosis model for WSIs, previous methods typically employ Multi-Instance Learning to enable slide-level prediction given only slide-level labels. Among these models, vanilla attention mechanisms without pairwise interactions have traditionally been employed but are unable to model contextual information. More recently, self-attention models have been utilized to address this issue. To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention. Both approaches suffer from suboptimal performance due to the loss of key information. Moreover, their use of absolute positional embedding struggles to effectively handle long contextual dependencies in shape-varying WSIs. In this paper, we first analyze how the low-rank nature of the long-sequence attention matrix constrains the representation ability of WSI modelling. Then, we demonstrate that the rank of attention matrix can be improved by focusing on local interactions via a local attention mask. Our analysis shows that the local mask aligns with the attention patterns in the lower layers of the Transformer. Furthermore, the local attention mask can be implemented during chunked attention calculation, reducing the quadratic computational complexity to linear with a small local bandwidth. Building on this, we propose a local-global hybrid Transformer for both computational acceleration and local-global information interactions modelling. Our method, Long-contextual MIL (LongMIL), is evaluated through extensive experiments on various WSI tasks to validate its superiority. Our code will be available at github.com/invoker-LL/Long-MIL.

* NeurIPS-2024. arXiv admin note: text overlap with arXiv:2311.12885

View paper on

Share this with someone who'll enjoy it:

Title:Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Paper and Code