Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pavan Chakraborty

Polyp detection in colonoscopy images using YOLOv11

Jan 15, 2025

Alok Ranjan Sahoo, Satya Sangram Sahoo, Pavan Chakraborty

Abstract:Colorectal cancer (CRC) is one of the most commonly diagnosed cancers all over the world. It starts as a polyp in the inner lining of the colon. To prevent CRC, early polyp detection is required. Colonosopy is used for the inspection of the colon. Generally, the images taken by the camera placed at the tip of the endoscope are analyzed by the experts manually. Various traditional machine learning models have been used with the rise of machine learning. Recently, deep learning models have shown more effectiveness in polyp detection due to their superiority in generalizing and learning small features. These deep learning models for object detection can be segregated into two different types: single-stage and two-stage. Generally, two stage models have higher accuracy than single stage ones but the single stage models have low inference time. Hence, single stage models are easy to use for quick object detection. YOLO is one of the singlestage models used successfully for polyp detection. It has drawn the attention of researchers because of its lower inference time. The researchers have used Different versions of YOLO so far, and with each newer version, the accuracy of the model is increasing. This paper aims to see the effectiveness of the recently released YOLOv11 to detect polyp. We analyzed the performance for all five models of YOLOv11 (YOLO11n, YOLO11s, YOLO11m, YOLO11l, YOLO11x) with Kvasir dataset for the training and testing. Two different versions of the dataset were used. The first consisted of the original dataset, and the other was created using augmentation techniques. The performance of all the models with these two versions of the dataset have been analysed.

Via

Access Paper or Ask Questions

Point-GR: Graph Residual Point Cloud Network for 3D Object Classification and Segmentation

Dec 04, 2024

Md Meraz, Md Afzal Ansari, Mohammed Javed, Pavan Chakraborty

Abstract:In recent years, the challenge of 3D shape analysis within point cloud data has gathered significant attention in computer vision. Addressing the complexities of effective 3D information representation and meaningful feature extraction for classification tasks remains crucial. This paper presents Point-GR, a novel deep learning architecture designed explicitly to transform unordered raw point clouds into higher dimensions while preserving local geometric features. It introduces residual-based learning within the network to mitigate the point permutation issues in point cloud data. The proposed Point-GR network significantly reduced the number of network parameters in Classification and Part-Segmentation compared to baseline graph-based networks. Notably, the Point-GR model achieves a state-of-the-art scene segmentation mean IoU of 73.47% on the S3DIS benchmark dataset, showcasing its effectiveness. Furthermore, the model shows competitive results in Classification and Part-Segmentation tasks.

* ICPR 2024 G2SP-CV Workshop, Dec 1-5, 2024 Kolkata, India

Via

Access Paper or Ask Questions

Inverse kinematics learning of a continuum manipulator using limited real time data

Mar 27, 2024

Alok Ranjan Sahoo, Pavan Chakraborty

Abstract:Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases for that. First, this paper proposes a method to use simulation data for training the model using MAML(Model-Agnostic Meta-Learning). Then, it adapts to the real world using gradient steps. Secondly,if the simulation model is not available or difficult to formulate, then we propose a CGAN(Conditional Generative adversial network)-MAML based method for it. The model is trained using a small amount of real time data and augmented data for different loading conditions. Then, adaptation is done in the real environment. It has been found out from the experiments that the relative positioning error for both the cases are below 3%. The proposed models are experimentally verified on a real continuum manipulator.

Via

Access Paper or Ask Questions

Hybrid CNN Bi-LSTM neural network for Hyperspectral image classification

Feb 15, 2024

Alok Ranjan Sahoo, Pavan Chakraborty

Abstract:Hyper spectral images have drawn the attention of the researchers for its complexity to classify. It has nonlinear relation between the materials and the spectral information provided by the HSI image. Deep learning methods have shown superiority in learning this nonlinearity in comparison to traditional machine learning methods. Use of 3-D CNN along with 2-D CNN have shown great success for learning spatial and spectral features. However, it uses comparatively large number of parameters. Moreover, it is not effective to learn inter layer information. Hence, this paper proposes a neural network combining 3-D CNN, 2-D CNN and Bi-LSTM. The performance of this model has been tested on Indian Pines(IP) University of Pavia(PU) and Salinas Scene(SA) data sets. The results are compared with the state of-the-art deep learning-based models. This model performed better in all three datasets. It could achieve 99.83, 99.98 and 100 percent accuracy using only 30 percent trainable parameters of the state-of-art model in IP, PU and SA datasets respectively.

Via

Access Paper or Ask Questions

Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Oct 22, 2023

Shwetang Dubey, Alok Ranjan Sahoo, Pavan Chakraborty

Figure 1 for Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Figure 2 for Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Figure 3 for Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Figure 4 for Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Abstract:Visually impaired persons find it difficult to know about their surroundings while walking on a road. Walking sticks used by them can only give them information about the obstacles in the stick's proximity. Moreover, it is mostly effective in static or very slow-paced environments. Hence, this paper introduces a method to guide them in a busy street. To create such a system it is very important to know about the approaching object and its direction of approach. To achieve this objective we created a method in which the image frame received from the video is divided into three parts i.e. center, left, and right to know the direction of approach of the approaching object. Object detection is done using YOLOv3. Lucas Kanade's optical flow estimation method is used for the optical flow estimation and Depth-net is used for depth estimation. Using the depth information, object motion trajectory, and object category information, the model provides necessary information/warning to the person. This model has been tested in the real world to show its effectiveness.

Via

Access Paper or Ask Questions

Local Directional Gradient Pattern: A Local Descriptor for Face Recognition

Jan 03, 2022

Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Local Directional Gradient Pattern: A Local Descriptor for Face Recognition

Figure 2 for Local Directional Gradient Pattern: A Local Descriptor for Face Recognition

Figure 3 for Local Directional Gradient Pattern: A Local Descriptor for Face Recognition

Figure 4 for Local Directional Gradient Pattern: A Local Descriptor for Face Recognition

Abstract:In this paper a local pattern descriptor in high order derivative space is proposed for face recognition. The proposed local directional gradient pattern (LDGP) is a 1D local micropattern computed by encoding the relationships between the higher order derivatives of the reference pixel in four distinct directions. The proposed descriptor identifies the relationship between the high order derivatives of the referenced pixel in four different directions to compute the micropattern which corresponds to the local feature. Proposed descriptor considerably reduces the length of the micropattern which consequently reduces the extraction time and matching time while maintaining the recognition rate. Results of the extensive experiments conducted on benchmark databases AT&T, Extended Yale B and CMU-PIE show that the proposed descriptor significantly reduces the extraction as well as matching time while the recognition rate is almost similar to the existing state of the art methods.

* Multimedia Tools and Applications, vol-76, no-1, pp. 1201-1216, (2017). (Springer) ISSN/ISBN: 1573-7721

Via

Access Paper or Ask Questions

Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained Facial Image Recognition and Retrieval

Jan 03, 2022

Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained Facial Image Recognition and Retrieval

Figure 2 for Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained Facial Image Recognition and Retrieval

Figure 3 for Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained Facial Image Recognition and Retrieval

Figure 4 for Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained Facial Image Recognition and Retrieval

Abstract:Feature description is one of the most frequently studied areas in the expert systems and machine learning. Effective encoding of the images is an essential requirement for accurate matching. These encoding schemes play a significant role in recognition and retrieval systems. Facial recognition systems should be effective enough to accurately recognize individuals under intrinsic and extrinsic variations of the system. The templates or descriptors used in these systems encode spatial relationships of the pixels in the local neighbourhood of an image. Features encoded using these hand crafted descriptors should be robust against variations such as; illumination, background, poses, and expressions. In this paper a novel hand crafted cascaded asymmetric local pattern (CALP) is proposed for retrieval and recognition facial image. The proposed descriptor uniquely encodes relationship amongst the neighbouring pixels in horizontal and vertical directions. The proposed encoding scheme has optimum feature length and shows significant improvement in accuracy under environmental and physiological changes in a facial image. State of the art hand crafted descriptors namely; LBP, LDGP, CSLBP, SLBP and CSLTP are compared with the proposed descriptor on most challenging datasets namely; Caltech-face, LFW, and CASIA-face-v5. Result analysis shows that, the proposed descriptor outperforms state of the art under uncontrolled variations in expressions, background, pose and illumination.

Via

Access Paper or Ask Questions

Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Jan 03, 2022

Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 2 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 3 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 4 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Abstract:In this paper a novel hand crafted local quadruple pattern (LQPAT) is proposed for facial image recognition and retrieval. Most of the existing hand-crafted descriptors encodes only a limited number of pixels in the local neighbourhood. Under unconstrained environment the performance of these descriptors tends to degrade drastically. The major problem in increasing the local neighbourhood is that, it also increases the feature length of the descriptor. The proposed descriptor try to overcome these problems by defining an efficient encoding structure with optimal feature length. The proposed descriptor encodes relations amongst the neighbours in quadruple space. Two micro patterns are computed from the local relationships to form the descriptor. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand crafted descriptors on bench mark databases namely; Caltech-face, LFW, Colour-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under uncontrolled variations in pose, illumination, background and expressions.

* Computers & Electrical Engineering, vol-62, pp. 92-104, (2017). (Elsevier) ISSN/ISBN: 0045-7906
* arXiv admin note: substantial text overlap with arXiv:2201.00504, arXiv:2201.00511

Via

Access Paper or Ask Questions

Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Jan 03, 2022

Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 2 for Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 3 for Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 4 for Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Abstract:Facial features are defined as the local relationships that exist amongst the pixels of a facial image. Hand-crafted descriptors identify the relationships of the pixels in the local neighbourhood defined by the kernel. Kernel is a two dimensional matrix which is moved across the facial image. Distinctive information captured by the kernel with limited number of pixel achieves satisfactory recognition and retrieval accuracies on facial images taken under constrained environment (controlled variations in light, pose, expressions, and background). To achieve similar accuracies under unconstrained environment local neighbourhood has to be increased, in order to encode more pixels. Increasing local neighbourhood also increases the feature length of the descriptor. In this paper we propose a hand-crafted descriptor namely Centre Symmetric Quadruple Pattern (CSQP), which is structurally symmetric and encodes the facial asymmetry in quadruple space. The proposed descriptor efficiently encodes larger neighbourhood with optimal number of binary bits. It has been shown using average entropy, computed over feature images encoded with the proposed descriptor, that the CSQP captures more meaningful information as compared to state of the art descriptors. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand-crafted descriptors (CSLBP, CSLTP, LDP, LBP, SLBP and LDGP) on bench mark databases namely; LFW, Colour-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under controlled as well as uncontrolled variations in pose, illumination, background and expressions.

* Pattern Recognition Letters, vol-115, pp.50-58, (2018). (Elsevier) ISSN/ISBN: 0167-8655

Via

Access Paper or Ask Questions

Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval

Jan 03, 2022

Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval

Figure 2 for Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval

Figure 3 for Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval

Figure 4 for Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval

Abstract:Local descriptors used in face recognition are robust in a sense that these descriptors perform well in varying pose, illumination and lighting conditions. Accuracy of these descriptors depends on the precision of mapping the relationship that exists in the local neighborhood of a facial image into microstructures. In this paper a local gradient hexa pattern (LGHP) is proposed that identifies the relationship amongst the reference pixel and its neighboring pixels at different distances across different derivative directions. Discriminative information exists in the local neighborhood as well as in different derivative directions. Proposed descriptor effectively transforms these relationships into binary micropatterns discriminating interclass facial images with optimal precision. Recognition and retrieval performance of the proposed descriptor has been compared with state-of-the-art descriptors namely LDP and LVP over the most challenging and benchmark facial image databases, i.e. Cropped Extended Yale-B, CMU-PIE, color-FERET, and LFW. The proposed descriptor has better recognition as well as retrieval rates compared to state-of-the-art descriptors.

* IEEE Transactions on Circuits and Systems for Video Technology, vol-28, no-1, pp. 171-180, (2018). ISSN/ISBN: 1051-8215

Via

Access Paper or Ask Questions