Abstract:Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis, processing gigapixel-resolution images with slide-level labels. As pioneering efforts, attention-based MIL (ABMIL) and its variants are increasingly becoming popular due to the characteristics of simultaneously handling clinical diagnosis and tumor localization. However, the attention mechanism exhibits limitations in discriminating between instances, which often misclassifies tissues and potentially impairs MIL performance. This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues. Concretely, we dissect the calculation process of ABMIL and present an attribute scoring mechanism that measures the contribution of each instance to bag prediction effectively, quantifying instance attributes. Based on attribute quantification, we develop a spatial attribute constraint and an attribute ranking constraint to model instance correlations within and across slides, respectively. These constraints encourage the network to capture the spatial correlation and semantic similarity of instances, improving the ability of AttriMIL to distinguish tissue types and identify challenging instances. Additionally, AttriMIL employs a histopathology adaptive backbone that maximizes the pre-trained model's feature extraction capability for collecting pathological features. Extensive experiments on three public benchmarks demonstrate that our AttriMIL outperforms existing state-of-the-art frameworks across multiple evaluation metrics. The implementation code is available at https://github.com/MedCAI/AttriMIL.
Abstract:Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/JinPLu/H2ASeg.
Abstract:Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many tumors and the similarity of high-uptake normal areas to the tumor regions. To address these issues, this paper proposes a localization-to-segmentation framework (L2SNet) for precise tumor segmentation. L2SNet first localizes the possible lesions in the lesion localization phase and then uses the location cues to shape the segmentation results in the lesion segmentation phase. To further improve the segmentation performance of L2SNet, we design an adaptive threshold scheme that takes the segmentation results of the two phases into consideration. The experiments with the MICCAI 2023 Automated Lesion Segmentation in Whole-Body FDG-PET/CT challenge dataset show that our method achieved a competitive result and was ranked in the top 7 methods on the preliminary test set. Our work is available at: https://github.com/MedCAI/L2SNet.