Abstract:The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [https://github.com/mudit-adityaja/UnSegMedGAT].
Abstract:The data-hungry approach of supervised classification drives the interest of the researchers toward unsupervised approaches, especially for problems such as medical image segmentation, where labeled data are difficult to get. Motivated by the recent success of Vision transformers (ViT) in various computer vision tasks, we propose an unsupervised segmentation framework with a pre-trained ViT. Moreover, by harnessing the graph structure inherent within the image, the proposed method achieves a notable performance in segmentation, especially in medical images. We further introduce a modularity-based loss function coupled with an Auto-Regressive Moving Average (ARMA) filter to capture the inherent graph topology within the image. Finally, we observe that employing Scaled Exponential Linear Unit (SELU) and SILU (Swish) activation functions within the proposed Graph Neural Network (GNN) architecture enhances the performance of segmentation. The proposed method provides state-of-the-art performance (even comparable to supervised methods) on benchmark image segmentation datasets such as ECSSD, DUTS, and CUB, as well as challenging medical image segmentation datasets such as KVASIR, CVC-ClinicDB, ISIC-2018. The github repository of the code is available on \url{https://github.com/ksgr5566/UnSeGArmaNet}.
Abstract:Image segmentation, the process of partitioning an image into meaningful regions, plays a pivotal role in computer vision and medical imaging applications. Unsupervised segmentation, particularly in the absence of labeled data, remains a challenging task due to the inter-class similarity and variations in intensity and resolution. In this study, we extract high-level features of the input image using pretrained vision transformer. Subsequently, the proposed method leverages the underlying graph structures of the images, seeking to discover and delineate meaningful boundaries using graph neural networks and modularity based optimization criteria without relying on pre-labeled training data. Experimental results on benchmark datasets demonstrate the effectiveness and versatility of the proposed approach, showcasing competitive performance compared to the state-of-the-art unsupervised segmentation methods. This research contributes to the broader field of unsupervised medical imaging and computer vision by presenting an innovative methodology for image segmentation that aligns with real-world challenges. The proposed method holds promise for diverse applications, including medical imaging, remote sensing, and object recognition, where labeled data may be scarce or unavailable. The github repository of the code is available on [https://github.com/ksgr5566/unseggnet]