Abstract:In recent years, the challenge of 3D shape analysis within point cloud data has gathered significant attention in computer vision. Addressing the complexities of effective 3D information representation and meaningful feature extraction for classification tasks remains crucial. This paper presents Point-GR, a novel deep learning architecture designed explicitly to transform unordered raw point clouds into higher dimensions while preserving local geometric features. It introduces residual-based learning within the network to mitigate the point permutation issues in point cloud data. The proposed Point-GR network significantly reduced the number of network parameters in Classification and Part-Segmentation compared to baseline graph-based networks. Notably, the Point-GR model achieves a state-of-the-art scene segmentation mean IoU of 73.47% on the S3DIS benchmark dataset, showcasing its effectiveness. Furthermore, the model shows competitive results in Classification and Part-Segmentation tasks.
Abstract:Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases for that. First, this paper proposes a method to use simulation data for training the model using MAML(Model-Agnostic Meta-Learning). Then, it adapts to the real world using gradient steps. Secondly,if the simulation model is not available or difficult to formulate, then we propose a CGAN(Conditional Generative adversial network)-MAML based method for it. The model is trained using a small amount of real time data and augmented data for different loading conditions. Then, adaptation is done in the real environment. It has been found out from the experiments that the relative positioning error for both the cases are below 3%. The proposed models are experimentally verified on a real continuum manipulator.
Abstract:Hyper spectral images have drawn the attention of the researchers for its complexity to classify. It has nonlinear relation between the materials and the spectral information provided by the HSI image. Deep learning methods have shown superiority in learning this nonlinearity in comparison to traditional machine learning methods. Use of 3-D CNN along with 2-D CNN have shown great success for learning spatial and spectral features. However, it uses comparatively large number of parameters. Moreover, it is not effective to learn inter layer information. Hence, this paper proposes a neural network combining 3-D CNN, 2-D CNN and Bi-LSTM. The performance of this model has been tested on Indian Pines(IP) University of Pavia(PU) and Salinas Scene(SA) data sets. The results are compared with the state of-the-art deep learning-based models. This model performed better in all three datasets. It could achieve 99.83, 99.98 and 100 percent accuracy using only 30 percent trainable parameters of the state-of-art model in IP, PU and SA datasets respectively.
Abstract:Visually impaired persons find it difficult to know about their surroundings while walking on a road. Walking sticks used by them can only give them information about the obstacles in the stick's proximity. Moreover, it is mostly effective in static or very slow-paced environments. Hence, this paper introduces a method to guide them in a busy street. To create such a system it is very important to know about the approaching object and its direction of approach. To achieve this objective we created a method in which the image frame received from the video is divided into three parts i.e. center, left, and right to know the direction of approach of the approaching object. Object detection is done using YOLOv3. Lucas Kanade's optical flow estimation method is used for the optical flow estimation and Depth-net is used for depth estimation. Using the depth information, object motion trajectory, and object category information, the model provides necessary information/warning to the person. This model has been tested in the real world to show its effectiveness.
Abstract:In this paper a local pattern descriptor in high order derivative space is proposed for face recognition. The proposed local directional gradient pattern (LDGP) is a 1D local micropattern computed by encoding the relationships between the higher order derivatives of the reference pixel in four distinct directions. The proposed descriptor identifies the relationship between the high order derivatives of the referenced pixel in four different directions to compute the micropattern which corresponds to the local feature. Proposed descriptor considerably reduces the length of the micropattern which consequently reduces the extraction time and matching time while maintaining the recognition rate. Results of the extensive experiments conducted on benchmark databases AT&T, Extended Yale B and CMU-PIE show that the proposed descriptor significantly reduces the extraction as well as matching time while the recognition rate is almost similar to the existing state of the art methods.
Abstract:Feature description is one of the most frequently studied areas in the expert systems and machine learning. Effective encoding of the images is an essential requirement for accurate matching. These encoding schemes play a significant role in recognition and retrieval systems. Facial recognition systems should be effective enough to accurately recognize individuals under intrinsic and extrinsic variations of the system. The templates or descriptors used in these systems encode spatial relationships of the pixels in the local neighbourhood of an image. Features encoded using these hand crafted descriptors should be robust against variations such as; illumination, background, poses, and expressions. In this paper a novel hand crafted cascaded asymmetric local pattern (CALP) is proposed for retrieval and recognition facial image. The proposed descriptor uniquely encodes relationship amongst the neighbouring pixels in horizontal and vertical directions. The proposed encoding scheme has optimum feature length and shows significant improvement in accuracy under environmental and physiological changes in a facial image. State of the art hand crafted descriptors namely; LBP, LDGP, CSLBP, SLBP and CSLTP are compared with the proposed descriptor on most challenging datasets namely; Caltech-face, LFW, and CASIA-face-v5. Result analysis shows that, the proposed descriptor outperforms state of the art under uncontrolled variations in expressions, background, pose and illumination.
Abstract:In this paper a novel hand crafted local quadruple pattern (LQPAT) is proposed for facial image recognition and retrieval. Most of the existing hand-crafted descriptors encodes only a limited number of pixels in the local neighbourhood. Under unconstrained environment the performance of these descriptors tends to degrade drastically. The major problem in increasing the local neighbourhood is that, it also increases the feature length of the descriptor. The proposed descriptor try to overcome these problems by defining an efficient encoding structure with optimal feature length. The proposed descriptor encodes relations amongst the neighbours in quadruple space. Two micro patterns are computed from the local relationships to form the descriptor. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand crafted descriptors on bench mark databases namely; Caltech-face, LFW, Colour-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under uncontrolled variations in pose, illumination, background and expressions.
Abstract:Facial features are defined as the local relationships that exist amongst the pixels of a facial image. Hand-crafted descriptors identify the relationships of the pixels in the local neighbourhood defined by the kernel. Kernel is a two dimensional matrix which is moved across the facial image. Distinctive information captured by the kernel with limited number of pixel achieves satisfactory recognition and retrieval accuracies on facial images taken under constrained environment (controlled variations in light, pose, expressions, and background). To achieve similar accuracies under unconstrained environment local neighbourhood has to be increased, in order to encode more pixels. Increasing local neighbourhood also increases the feature length of the descriptor. In this paper we propose a hand-crafted descriptor namely Centre Symmetric Quadruple Pattern (CSQP), which is structurally symmetric and encodes the facial asymmetry in quadruple space. The proposed descriptor efficiently encodes larger neighbourhood with optimal number of binary bits. It has been shown using average entropy, computed over feature images encoded with the proposed descriptor, that the CSQP captures more meaningful information as compared to state of the art descriptors. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand-crafted descriptors (CSLBP, CSLTP, LDP, LBP, SLBP and LDGP) on bench mark databases namely; LFW, Colour-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under controlled as well as uncontrolled variations in pose, illumination, background and expressions.
Abstract:Local descriptors used in face recognition are robust in a sense that these descriptors perform well in varying pose, illumination and lighting conditions. Accuracy of these descriptors depends on the precision of mapping the relationship that exists in the local neighborhood of a facial image into microstructures. In this paper a local gradient hexa pattern (LGHP) is proposed that identifies the relationship amongst the reference pixel and its neighboring pixels at different distances across different derivative directions. Discriminative information exists in the local neighborhood as well as in different derivative directions. Proposed descriptor effectively transforms these relationships into binary micropatterns discriminating interclass facial images with optimal precision. Recognition and retrieval performance of the proposed descriptor has been compared with state-of-the-art descriptors namely LDP and LVP over the most challenging and benchmark facial image databases, i.e. Cropped Extended Yale-B, CMU-PIE, color-FERET, and LFW. The proposed descriptor has better recognition as well as retrieval rates compared to state-of-the-art descriptors.
Abstract:In this paper R-Theta Local Neighborhood Pattern (RTLNP) is proposed for facial image retrieval. RTLNP exploits relationships amongst the pixels in local neighborhood of the reference pixel at different angular and radial widths. The proposed encoding scheme divides the local neighborhood into sectors of equal angular width. These sectors are again divided into subsectors of two radial widths. Average grayscales values of these two subsectors are encoded to generate the micropatterns. Performance of the proposed descriptor has been evaluated and results are compared with the state of the art descriptors e.g. LBP, LTP, CSLBP, CSLTP, Sobel-LBP, LTCoP, LMeP, LDP, LTrP, MBLBP, BRINT and SLBP. The most challenging facial constrained and unconstrained databases, namely; AT&T, CARIA-Face-V5-Cropped, LFW, and Color FERET have been used for showing the efficiency of the proposed descriptor. Proposed descriptor is also tested on near infrared (NIR) face databases; CASIA NIR-VIS 2.0 and PolyU-NIRFD to explore its potential with respect to NIR facial images. Better retrieval rates of RTLNP as compared to the existing state of the art descriptors show the effectiveness of the descriptor