Abstract:This work proposes a novel approach beyond supervised learning for effective pathological image analysis, addressing the challenge of limited robust labeled data. Pathological diagnosis of diseases like cancer has conventionally relied on the evaluation of morphological features by physicians and pathologists. However, recent advancements in compute-aided diagnosis (CAD) systems are gaining significant attention as diagnostic support tools. Although the advancement of deep learning has improved CAD significantly, segmentation models typically require large pixel-level annotated dataset, and such labeling is expensive. Existing studies not based on supervised approaches still struggle with limited generalization, and no practical approach has emerged yet. To address this issue, we present a weakly supervised semantic segmentation (WSSS) model by combining class activation map and Segment Anything Model (SAM)-based pseudo-labeling. For effective pretraining, we adopt the SAM-a foundation model that is pretrained on large datasets and operates in zero-shot configurations using only coarse prompts. The proposed approach transfer enhanced Attention Dropout Layer's knowledge to SAM, thereby generating pseudo-labels. To demonstrate the superiority of the proposed method, experimental studies are conducted on histopathological breast cancer datasets. The proposed method outperformed other WSSS methods across three datasets, demonstrating its efficiency by achieving this with only 12GB of GPU memory during training. Our code is available at : https://github.com/QI-NemoSong/EP-SAM
Abstract:This work proposes a novel approach beyond supervised learning for effective pathological image analysis, addressing the challenge of limited robust labeled data. Pathological diagnosis of diseases like cancer has conventionally relied on the evaluation of morphological features by physicians and pathologists. However, recent advancements in compute-aided diagnosis (CAD) systems are gaining significant attention as diagnostic support tools. Although the advancement of deep learning has improved CAD significantly, segmentation models typically require large pixel-level annotated dataset, and such labeling is expensive. Existing studies not based on supervised approaches still struggle with limited generalization, and no practical approach has emerged yet. To address this issue, we present a weakly supervised semantic segmentation (WSSS) model by combining class activation map and Segment Anything Model (SAM)-based pseudo-labeling. For effective pretraining, we adopt the SAM-a foundation model that is pretrained on large datasets and operates in zero-shot configurations using only coarse prompts. The proposed approach transfer enhanced Attention Dropout Layer's knowledge to SAM, thereby generating pseudo-labels. To demonstrate the superiority of the proposed method, experimental studies are conducted on histopathological breast cancer datasets. The proposed method outperformed other WSSS methods across three datasets, demonstrating its efficiency by achieving this with only 12GB of GPU memory during training. Our code is available at : https://github.com/QI-NemoSong/EPLC-SAM
Abstract:The Foundation model for image segmentation, Segment Anything (SAM), has been actively researched in various fields since its proposal. Various researches have been proposed to adapt SAM to specific domains, with one notable approach involving the addition and training of lightweight adapter modules. While adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements, they face a often overlooked issue: the excessive consumption of GPU memory relative to the number of trainable parameters. Addressing this issue, this paper proposes a memory-efficient parallel convolutional adapter architecture. This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training. Our proposed architecture demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter, indicating its value as an alternative to simple decoder fine-tuning when hardware limitations preclude adapter-based learning. Our code implementation is available at our github.