Abstract:Computational pathology, integrating computational methods and digital imaging, has shown to be effective in advancing disease diagnosis and prognosis. In recent years, the development of machine learning and deep learning has greatly bolstered the power of computational pathology. However, there still remains the issue of data scarcity and data imbalance, which can have an adversarial effect on any computational method. In this paper, we introduce an efficient and effective data augmentation strategy to generate new pathology images from the existing pathology images and thus enrich datasets without additional data collection or annotation costs. To evaluate the proposed method, we employed two sets of colorectal cancer datasets and obtained improved classification results, suggesting that the proposed simple approach holds the potential for alleviating the data scarcity and imbalance in computational pathology.
Abstract:In computational pathology, researchers often face challenges due to the scarcity of labeled pathology datasets. Data augmentation emerges as a crucial technique to mitigate this limitation. In this study, we introduce an efficient data augmentation method for pathology images, called USegMix. Given a set of pathology images, the proposed method generates a new, synthetic image in two phases. In the first phase, USegMix constructs a pool of tissue segments in an automated and unsupervised manner using superpixels and the Segment Anything Model (SAM). In the second phase, USegMix selects a candidate segment in a target image, replaces it with a similar segment from the segment pool, and blends them by using a pre-trained diffusion model. In this way, USegMix can generate diverse and realistic pathology images. We rigorously evaluate the effectiveness of USegMix on two pathology image datasets of colorectal and prostate cancers. The results demonstrate improvements in cancer classification performance, underscoring the substantial potential of USegMix for pathology image analysis.
Abstract:Nuclei instance segmentation is an essential task in pathology image analysis, serving as the foundation for many downstream applications. The release of several public datasets has significantly advanced research in this area, yet many existing methods struggle with data imbalance issues. To address this challenge, this study introduces a data augmentation method, called NucleiMix, which is designed to balance the distribution of nuclei types by increasing the number of rare-type nuclei within datasets. NucleiMix operates in two phases. In the first phase, it identifies candidate locations similar to the surroundings of rare-type nuclei and inserts rare-type nuclei into the candidate locations. In the second phase, it employs a progressive inpainting strategy using a pre-trained diffusion model to seamlessly integrate rare-type nuclei into their new environments in replacement of major-type nuclei or background locations. We systematically evaluate the effectiveness of NucleiMix on three public datasets using two popular nuclei instance segmentation models. The results demonstrate the superior ability of NucleiMix to synthesize realistic rare-type nuclei and to enhance the quality of nuclei segmentation and classification in an accurate and robust manner.