Abstract:Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grouping to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks. \href{https://github.com/jsh0792/MCTI}{https://github.com/jsh0792/MCTI}.
Abstract:Deep learning has achieved impressive results in nuclei segmentation, but the massive requirement for pixel-wise labels remains a significant challenge. To alleviate the annotation burden, existing methods generate pseudo masks for model training using point labels. However, the generated masks are inevitably different from the ground truth, and these dissimilarities are not handled reasonably during the network training, resulting in the subpar performance of the segmentation model. To tackle this issue, we propose a framework named DoNuSeg, enabling \textbf{D}ynamic pseudo label \textbf{O}ptimization in point-supervised \textbf{Nu}clei \textbf{Seg}mentation. Specifically, DoNuSeg takes advantage of class activation maps (CAMs) to adaptively capture regions with semantics similar to annotated points. To leverage semantic diversity in the hierarchical feature levels, we design a dynamic selection module to choose the optimal one among CAMs from different encoder blocks as pseudo masks. Meanwhile, a CAM-guided contrastive module is proposed to further enhance the accuracy of pseudo masks. In addition to exploiting the semantic information provided by CAMs, we consider location priors inherent to point labels, developing a task-decoupled structure for effectively differentiating nuclei. Extensive experiments demonstrate that DoNuSeg outperforms state-of-the-art point-supervised methods. The code is available at https://github.com/shinning0821/MICCAI24-DoNuSeg.
Abstract:Weakly supervised segmentation methods have gained significant attention due to their ability to reduce the reliance on costly pixel-level annotations during model training. However, the current weakly supervised nuclei segmentation approaches typically follow a two-stage pseudo-label generation and network training process. The performance of the nuclei segmentation heavily relies on the quality of the generated pseudo-labels, thereby limiting its effectiveness. This paper introduces a novel domain-adaptive weakly supervised nuclei segmentation framework using cross-task interaction strategies to overcome the challenge of pseudo-label generation. Specifically, we utilize weakly annotated data to train an auxiliary detection task, which assists the domain adaptation of the segmentation network. To enhance the efficiency of domain adaptation, we design a consistent feature constraint module integrating prior knowledge from the source domain. Furthermore, we develop pseudo-label optimization and interactive training methods to improve the domain transfer capability. To validate the effectiveness of our proposed method, we conduct extensive comparative and ablation experiments on six datasets. The results demonstrate the superiority of our approach over existing weakly supervised approaches. Remarkably, our method achieves comparable or even better performance than fully supervised methods. Our code will be released in https://github.com/zhangye-zoe/DAWN.
Abstract:Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis, processing gigapixel-resolution images with slide-level labels. As pioneering efforts, attention-based MIL (ABMIL) and its variants are increasingly becoming popular due to the characteristics of simultaneously handling clinical diagnosis and tumor localization. However, the attention mechanism exhibits limitations in discriminating between instances, which often misclassifies tissues and potentially impairs MIL performance. This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues. Concretely, we dissect the calculation process of ABMIL and present an attribute scoring mechanism that measures the contribution of each instance to bag prediction effectively, quantifying instance attributes. Based on attribute quantification, we develop a spatial attribute constraint and an attribute ranking constraint to model instance correlations within and across slides, respectively. These constraints encourage the network to capture the spatial correlation and semantic similarity of instances, improving the ability of AttriMIL to distinguish tissue types and identify challenging instances. Additionally, AttriMIL employs a histopathology adaptive backbone that maximizes the pre-trained model's feature extraction capability for collecting pathological features. Extensive experiments on three public benchmarks demonstrate that our AttriMIL outperforms existing state-of-the-art frameworks across multiple evaluation metrics. The implementation code is available at https://github.com/MedCAI/AttriMIL.
Abstract:Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/JinPLu/H2ASeg.
Abstract:Semi-supervised segmentation methods have demonstrated promising results in natural scenarios, providing a solution to reduce dependency on manual annotation. However, these methods face significant challenges when directly applied to pathological images due to the subtle color differences between nuclei and tissues, as well as the significant morphological variations among nuclei. Consequently, the generated pseudo-labels often contain much noise, especially at the nuclei boundaries. To address the above problem, this paper proposes a boundary-aware contrastive learning network to denoise the boundary noise in a semi-supervised nuclei segmentation task. The model has two key designs: a low-resolution denoising (LRD) module and a cross-RoI contrastive learning (CRC) module. The LRD improves the smoothness of the nuclei boundary by pseudo-labels denoising, and the CRC enhances the discrimination between foreground and background by boundary feature contrastive learning. We conduct extensive experiments to demonstrate the superiority of our proposed method over existing semi-supervised instance segmentation methods.
Abstract:Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions. To address these problems, this paper proposes a structure encoding and interaction network, termed SEINE, which develops the structure modeling scheme of nuclei and exploits the structure similarity between nuclei to improve the integrality of each segmented instance. Concretely, SEINE introduces a contour-based structure encoding (SE) that considers the correlation between nuclei structure and semantics, realizing a reasonable representation of the nuclei structure. Based on the encoding, we propose a structure-guided attention (SGA) that takes the clear nuclei as prototypes to enhance the structure learning for the fuzzy nuclei. To strengthen the structural learning ability, a semantic feature fusion (SFF) is presented to boost the semantic consistency of semantic and structure branches. Furthermore, a position enhancement (PE) method is applied to suppress incorrect nuclei boundary predictions. Extensive experiments demonstrate the superiority of our approaches, and SEINE achieves state-of-the-art (SOTA) performance on four datasets. The code is available at \href{https://github.com/zhangye-zoe/SEINE}{https://github.com/zhangye-zoe/SEINE}.
Abstract:Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many tumors and the similarity of high-uptake normal areas to the tumor regions. To address these issues, this paper proposes a localization-to-segmentation framework (L2SNet) for precise tumor segmentation. L2SNet first localizes the possible lesions in the lesion localization phase and then uses the location cues to shape the segmentation results in the lesion segmentation phase. To further improve the segmentation performance of L2SNet, we design an adaptive threshold scheme that takes the segmentation results of the two phases into consideration. The experiments with the MICCAI 2023 Automated Lesion Segmentation in Whole-Body FDG-PET/CT challenge dataset show that our method achieved a competitive result and was ranked in the top 7 methods on the preliminary test set. Our work is available at: https://github.com/MedCAI/L2SNet.
Abstract:Ovarian cancer is one of the most harmful gynecological diseases. Detecting ovarian tumors in early stage with computer-aided techniques can efficiently decrease the mortality rate. With the improvement of medical treatment standard, ultrasound images are widely applied in clinical treatment. However, recent notable methods mainly focus on single-modality ultrasound ovarian tumor segmentation or recognition, which means there still lacks of researches on exploring the representation capability of multi-modality ultrasound ovarian tumor images. To solve this problem, we propose a Multi-Modality Ovarian Tumor Ultrasound (MMOTU) image dataset containing 1469 2d ultrasound images and 170 contrast enhanced ultrasonography (CEUS) images with pixel-wise and global-wise annotations. Based on MMOTU, we mainly focus on unsupervised cross-domain semantic segmentation task. To solve the domain shift problem, we propose a feature alignment based architecture named Dual-Scheme Domain-Selected Network (DS$^2$Net). Specifically, we first design source-encoder and target-encoder to extract two-style features of source and target images. Then, we propose Domain-Distinct Selected Module (DDSM) and Domain-Universal Selected Module (DUSM) to extract the distinct and universal features in two styles (source-style or target-style). Finally, we fuse these two kinds of features and feed them into the source-decoder and target-decoder to generate final predictions. Extensive comparison experiments and analysis on MMOTU image dataset show that DS$^2$Net can boost the segmentation performance for bidirectional cross-domain adaptation of 2d ultrasound images and CEUS images.