Abstract:Deep supervised learning models require high volume of labeled data to attain sufficiently good results. Although, the practice of gathering and annotating such big data is costly and laborious. Recently, the application of self supervised learning (SSL) in vision tasks has gained significant attention. The intuition behind SSL is to exploit the synchronous relationships within the data as a form of self-supervision, which can be versatile. In the current big data era, most of the data is unlabeled, and the success of SSL thus relies in finding ways to improve this vast amount of unlabeled data available. Thus its better for deep learning algorithms to reduce reliance on human supervision and instead focus on self-supervision based on the inherent relationships within the data. With the advent of ViTs, which have achieved remarkable results in computer vision, it is crucial to explore and understand the various SSL mechanisms employed for training these models specifically in scenarios where there is less label data available. In this survey we thus develop a comprehensive taxonomy of systematically classifying the SSL techniques based upon their representations and pre-training tasks being applied. Additionally, we discuss the motivations behind SSL, review popular pre-training tasks, and highlight the challenges and advancements in this field. Furthermore, we present a comparative analysis of different SSL methods, evaluate their strengths and limitations, and identify potential avenues for future research.
Abstract:Vision transformers have recently become popular as a possible alternative to convolutional neural networks (CNNs) for a variety of computer vision applications. These vision transformers due to their ability to focus on global relationships in images have large capacity, but may result in poor generalization as compared to CNNs. Very recently, the hybridization of convolution and self-attention mechanisms in vision transformers is gaining popularity due to their ability of exploiting both local and global image representations. These CNN-Transformer architectures also known as hybrid vision transformers have shown remarkable results for vision applications. Recently, due to the rapidly growing number of these hybrid vision transformers, there is a need for a taxonomy and explanation of these architectures. This survey presents a taxonomy of the recent vision transformer architectures, and more specifically that of the hybrid vision transformers. Additionally, the key features of each architecture such as the attention mechanisms, positional embeddings, multi-scale processing, and convolution are also discussed. This survey highlights the potential of hybrid vision transformers to achieve outstanding performance on a variety of computer vision tasks. Moreover, it also points towards the future directions of this rapidly evolving field.
Abstract:Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.
Abstract:Designing an intrusion detection system is difficult as network traffic encompasses various attack types, including new and evolving ones with minor changes. The data used to construct a predictive model has a skewed class distribution and limited representation of attack types, which differ from real network traffic. These limitations result in dataset shift, negatively impacting the machine learning models' predictive abilities and reducing the detection rate against novel attacks. To address the challenge of dataset shift, we introduce the INformation FUsion and Stacking Ensemble (INFUSE) for network intrusion detection. This approach further improves its predictive power by employing a deep neural network-based Meta-Learner on top of INFUSE. First, a hybrid feature space is created by integrating decision and feature spaces. Five different classifiers are utilized to generate a pool of decision spaces. The feature space is then enriched through a deep sparse autoencoder that learns the semantic relationships between attacks. Finally, the deep Meta-Learner acts as an ensemble combiner to analyze the hybrid feature space and make a final decision. Our evaluation on stringent benchmark datasets and comparison to existing techniques showed the effectiveness of INFUSE with an F-Score of 0.91, Accuracy of 91.6%, and Recall of 0.94 on the Test+ dataset, and an F-Score of 0.91, Accuracy of 85.6%, and Recall of 0.87 on the stringent Test-21 dataset. These promising results indicate the proposed technique has strong generalization capability and the potential to detect network attacks.
Abstract:The Coronavirus (COVID-19) outbreak in December 2019 has become an ongoing threat to humans worldwide, creating a health crisis that infected millions of lives, as well as devastating the global economy. Deep learning (DL) techniques have proved helpful in analysis and delineation of infectious regions in radiological images in a timely manner. This paper makes an in-depth survey of DL techniques and draws a taxonomy based on diagnostic strategies and learning approaches. DL techniques are systematically categorized into classification, segmentation, and multi-stage approaches for COVID-19 diagnosis at image and region level analysis. Each category includes pre-trained and custom-made Convolutional Neural Network architectures for detecting COVID-19 infection in radiographic imaging modalities; X-Ray, and Computer Tomography (CT). Furthermore, a discussion is made on challenges in developing diagnostic techniques in pandemic, cross-platform interoperability, and examining imaging modality, in addition to reviewing methodologies and performance measures used in these techniques. This survey provides an insight into promising areas of research in DL for analyzing radiographic images and thus, may further accelerate the research in designing of customized DL based diagnostic tools for effectively dealing with new variants of COVID-19 and emerging challenges.
Abstract:COVID-19 is a highly contagious respiratory infection that has affected a large population across the world and continues with its devastating consequences. It is imperative to detect COVID-19 at the earliest to limit the span of infection. In this work, a new classification technique CB-STM-RENet based on deep Convolutional Neural Network (CNN) and Channel Boosting is proposed for the screening of COVID-19 in chest X-Rays. In this connection, to learn the COVID-19 specific radiographic patterns, a new convolution block based on split-transform-merge (STM) is developed. This new block systematically incorporates region and edge-based operations at each branch to capture the diverse set of features at various levels, especially those related to region homogeneity, textural variations, and boundaries of the infected region. The learning and discrimination capability of the proposed CNN architecture is enhanced by exploiting the Channel Boosting idea that concatenates the auxiliary channels along with the original channels. The auxiliary channels are generated from the pre-trained CNNs using Transfer Learning. The effectiveness of the proposed technique CB-STM-RENet is evaluated on three different datasets of chest X-Rays namely CoV-Healthy-6k, CoV-NonCoV-10k, and CoV-NonCoV-15k. The performance comparison of the proposed CB-STM-RENet with the existing techniques exhibits high performance both in discriminating COVID-19 chest infections from Healthy, as well as, other types of chest infections. CB-STM-RENet provides the highest performance on all these three datasets; especially on the stringent CoV-NonCoV-15k dataset. The good detection rate (97%), and high precision (93%) of the proposed technique suggest that it can be adapted for the diagnosis of COVID-19 infected patients. The test code is available at https://github.com/PRLAB21/COVID-19-Detection-System-using-Chest-X-Ray-Images.
Abstract:COVID-19 is a global health problem. Consequently, early detection and analysis of the infection patterns are crucial for controlling infection spread as well as devising a treatment plan. This work proposes a two-stage deep Convolutional Neural Networks (CNNs) based framework for delineation of COVID-19 infected regions in Lung CT images. In the first stage, initially, COVID-19 specific CT image features are enhanced using a two-level discrete wavelet transformation. These enhanced CT images are then classified using the proposed custom-made deep CoV-CTNet. In the second stage, the CT images classified as infectious images are provided to the segmentation models for the identification and analysis of COVID-19 infectious regions. In this regard, we propose a novel semantic segmentation model CoV-RASeg, which systematically uses average and max pooling operations in the encoder and decoder blocks. This systematic utilization of max and average pooling operations helps the proposed CoV-RASeg in simultaneously learning both the boundaries and region homogeneity. Moreover, the idea of attention is incorporated to deal with mildly infected regions. The proposed two-stage framework is evaluated on a standard Lung CT image dataset, and its performance is compared with the existing deep CNN models. The performance of the proposed CoV-CTNet is evaluated using Mathew Correlation Coefficient (MCC) measure (0.98) and that of proposed CoV-RASeg using Dice Similarity (DS) score (0.95). The promising results on an unseen test set suggest that the proposed framework has the potential to help the radiologists in the identification and analysis of COVID-19 infected regions.
Abstract:Empirical evaluation of breast tissue biopsies for mitotic nuclei detection is considered an important prognostic biomarker in tumor grading and cancer progression. However, automated mitotic nuclei detection poses several challenges because of the unavailability of pixel-level annotations, different morphological configurations of mitotic nuclei, their sparse representation, and close resemblance with non-mitotic nuclei. These challenges undermine the precision of the automated detection model and thus make detection difficult in a single phase. This work proposes an end-to-end detection system for mitotic nuclei identification in breast cancer histopathological images. Deep object detection-based Mask R-CNN is adapted for mitotic nuclei detection that initially selects the candidate mitotic region with maximum recall. However, in the second phase, these candidate regions are refined by multi-object loss function to improve the precision. The performance of the proposed detection model shows improved discrimination ability (F-score of 0.86) for mitotic nuclei with significant precision (0.86) as compared to the two-stage detection models (F-score of 0.701) on TUPAC16 dataset. Promising results suggest that the deep object detection-based model has the potential to learn the characteristic features of mitotic nuclei from weakly annotated data and suggests that it can be adapted for the identification of other nuclear bodies in histopathological images.
Abstract:Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.
Abstract:We present a novel architectural enhancement of Channel Boosting in deep convolutional neural network (CNN). This idea of Channel Boosting exploits both the channel dimension of CNN (learning from multiple channels) and Transfer learning (TL). TL is utilized at two different stages, channel generation and channel exploitation. A deep CNN is boosted by various channels available through TL from already trained Deep NN, in addition to its own original channel. The deep architecture of CNN then exploits the original and boosted channels down the stream for learning discriminative patterns. Churn prediction in telecom is a challenging task due to high dimensionality and imbalanced nature of the data and it is therefore used to evaluate the performance of the proposed Channel Boosted CNN (CB-CNN). In the first phase, discriminative informative features are being extracted using a staked autoencoder, and then in the second phase, these features are combined with the original features to form Channel Boosted images. Finally, a pre-trained CNN is exploited by employing TL to perform classification. The results are promising and show the ability of the Channel Boosting concept in learning complex classification problem by discerning even minute differences in churners and non-churners. The proposed work validates the concept observed from the evolution of recent CNN architectures that the innovative restructuring may increase the representative capacity of the network.