Abstract:This work introduces EffiSegNet, a novel segmentation framework leveraging transfer learning with a pre-trained Convolutional Neural Network (CNN) classifier as its backbone. Deviating from traditional architectures with a symmetric U-shape, EffiSegNet simplifies the decoder and utilizes full-scale feature fusion to minimize computational cost and the number of parameters. We evaluated our model on the gastrointestinal polyp segmentation task using the publicly available Kvasir-SEG dataset, achieving state-of-the-art results. Specifically, the EffiSegNet-B4 network variant achieved an F1 score of 0.9552, mean Dice (mDice) 0.9483, mean Intersection over Union (mIoU) 0.9056, Precision 0.9679, and Recall 0.9429 with a pre-trained backbone - to the best of our knowledge, the highest reported scores in the literature for this dataset. Additional training from scratch also demonstrated exceptional performance compared to previous work, achieving an F1 score of 0.9286, mDice 0.9207, mIoU 0.8668, Precision 0.9311 and Recall 0.9262. These results underscore the importance of a well-designed encoder in image segmentation networks and the effectiveness of transfer learning approaches.
Abstract:Deep Convolutional Neural Networks (CNNs) for image classification successively alternate convolutions and downsampling operations, such as pooling layers or strided convolutions, resulting in lower resolution features the deeper the network gets. These downsampling operations save computational resources and provide some translational invariance as well as a bigger receptive field at the next layers. However, an inherent side-effect of this is that high-level features, produced at the deep end of the network, are always captured in low resolution feature maps. The inverse is also true, as shallow layers always contain small scale features. In biomedical image analysis engineers are often tasked with classifying very small image patches which carry only a limited amount of information. By their nature, these patches may not even contain objects, with the classification depending instead on the detection of subtle underlying patterns with an unknown scale in the image's texture. In these cases every bit of information is valuable; thus, it is important to extract the maximum number of informative features possible. Driven by these considerations, we introduce a new CNN architecture which preserves multi-scale features from deep, intermediate, and shallow layers by utilizing skip connections along with consecutive contractions and expansions of the feature maps. Using a dataset of very low resolution patches from Pancreatic Ductal Adenocarcinoma (PDAC) CT scans we demonstrate that our network can outperform current state of the art models.