Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daksh Thapar

From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Nov 26, 2022

Britty Baby, Daksh Thapar, Mustafa Chasmai, Tamajit Banerjee, Kunal Dargan, Ashish Suri, Subhashis Banerjee, Chetan Arora

Figure 1 for From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Figure 2 for From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Figure 3 for From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Figure 4 for From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Abstract:Minimally invasive surgeries and related applications demand surgical tool classification and segmentation at the instance level. Surgical tools are similar in appearance and are long, thin, and handled at an angle. The fine-tuning of state-of-the-art (SOTA) instance segmentation models trained on natural images for instrument segmentation has difficulty discriminating instrument classes. Our research demonstrates that while the bounding box and segmentation mask are often accurate, the classification head mis-classifies the class label of the surgical instrument. We present a new neural network framework that adds a classification module as a new stage to existing instance segmentation models. This module specializes in improving the classification of instrument masks generated by the existing model. The module comprises multi-scale mask attention, which attends to the instrument region and masks the distracting background features. We propose training our classifier module using metric learning with arc loss to handle low inter-class variance of surgical instruments. We conduct exhaustive experiments on the benchmark datasets EndoVis2017 and EndoVis2018. We demonstrate that our method outperforms all (more than 18) SOTA methods compared with, and improves the SOTA performance by at least 12 points (20%) on the EndoVis2017 benchmark challenge and generalizes effectively across the datasets.

* WACV 2023

Via

Access Paper or Ask Questions

PoshakNet: Framework for matching dresses from real-life photos using GAN and Siamese Network

Nov 11, 2019

Abhigyan Khaund, Daksh Thapar, Aditya Nigam

Figure 1 for PoshakNet: Framework for matching dresses from real-life photos using GAN and Siamese Network

Figure 2 for PoshakNet: Framework for matching dresses from real-life photos using GAN and Siamese Network

Figure 3 for PoshakNet: Framework for matching dresses from real-life photos using GAN and Siamese Network

Figure 4 for PoshakNet: Framework for matching dresses from real-life photos using GAN and Siamese Network

Abstract:Online garment shopping has gained many customers in recent years. Describing a dress using keywords does not always yield the proper results, which in turn leads to dissatisfaction of customers. A visual search based system will be enormously beneficent to the industry. Hence, we propose a framework that can retrieve similar clothes that can be found in an image. The first task is to extract the garment from the input image (street photo). There are various challenges for that, including pose, illumination, and background clutter. We use a Generative Adversarial Network for the task of retrieving the garment that the person in the image was wearing. It has been shown that GAN can retrieve the garment very efficiently despite the challenges of street photos. Finally, a siamese based matching system takes the retrieved cloth image and matches it with the clothes in the dataset, giving us the top k matches. We take a pre-trained inception-ResNet v1 module as a siamese network (trained using triplet loss for face detection) and fine-tune it on the shopping dataset using center loss. The dataset has been collected inhouse. For training the GAN, we use the LookBook dataset, which is publically available.

* Accepted in NCVPRIPG 2019

Via

Access Paper or Ask Questions

FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching

Apr 02, 2019

Daksh Thapar, Gaurav Jaswal, Aditya Nigam

Figure 1 for FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching

Figure 2 for FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching

Figure 3 for FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching

Figure 4 for FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching

Abstract:Current finger knuckle image recognition systems, often require users to place fingers' major or minor joints flatly towards the capturing sensor. To extend these systems for user non-intrusive application scenarios, such as consumer electronics, forensic, defence etc, we suggest matching the full dorsal fingers, rather than the major/ minor region of interest (ROI) alone. In particular, this paper makes a comprehensive study on the comparisons between full finger and fusion of finger ROI's for finger knuckle image recognition. These experiments suggest that using full-finger, provides a more elegant solution. Addressing the finger matching problem, we propose a CNN (convolutional neural network) which creates a $128$-D feature embedding of an image. It is trained via. triplet loss function, which enforces the L2 distance between the embeddings of the same subject to be approaching zero, whereas the distance between any 2 embeddings of different subjects to be at least a margin. For precise training of the network, we use dynamic adaptive margin, data augmentation, and hard negative mining. In distinguished experiments, the individual performance of finger, as well as weighted sum score level fusion of major knuckle, minor knuckle, and nail modalities have been computed, justifying our assumption to consider full finger as biometrics instead of its counterparts. The proposed method is evaluated using two publicly available finger knuckle image datasets i.e., PolyU FKP dataset and PolyU Contactless FKI Datasets.

* Accepted in IJCNN 2019

Via

Access Paper or Ask Questions

Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Mar 27, 2019

Anshul Thakur, Daksh Thapar, Padmanabhan Rajan, Aditya Nigam

Figure 1 for Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Figure 2 for Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Figure 3 for Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Figure 4 for Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Abstract:This paper proposes multiscale convolutional neural network (CNN)-based deep metric learning for bioacoustic classification, under low training data conditions. The proposed CNN is characterized by the utilization of four different filter sizes at each level to analyze input feature maps. This multiscale nature helps in describing different bioacoustic events effectively: smaller filters help in learning the finer details of bioacoustic events, whereas, larger filters help in analyzing a larger context leading to global details. A dynamic triplet loss is employed in the proposed CNN architecture to learn a transformation from the input space to the embedding space, where classification is performed. The triplet loss helps in learning this transformation by analyzing three examples, referred to as triplets, at a time where intra-class distance is minimized while maximizing the inter-class separation by a dynamically increasing margin. The number of possible triplets increases cubically with the dataset size, making triplet loss more suitable than the softmax cross-entropy loss in low training data conditions. Experiments on three different publicly available datasets show that the proposed framework performs better than existing bioacoustic classification frameworks. Experimental results also confirm the superiority of the triplet loss over the cross-entropy loss in low training data conditions

* Under Review at JASA. Primitive version of paper. We are still working on getting better performances out of the comparative methods

Via

Access Paper or Ask Questions

PVSNet: Palm Vein Authentication Siamese Network Trained using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features

Dec 15, 2018

Daksh Thapar, Gaurav Jaswal, Aditya Nigam, Vivek Kanhangad

Figure 1 for PVSNet: Palm Vein Authentication Siamese Network Trained using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features

Figure 2 for PVSNet: Palm Vein Authentication Siamese Network Trained using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features

Figure 3 for PVSNet: Palm Vein Authentication Siamese Network Trained using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features

Figure 4 for PVSNet: Palm Vein Authentication Siamese Network Trained using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features

Abstract:Designing an end-to-end deep learning network to match the biometric features with limited training samples is an extremely challenging task. To address this problem, we propose a new way to design an end-to-end deep CNN framework i.e., PVSNet that works in two major steps: first, an encoder-decoder network is used to learn generative domain-specific features followed by a Siamese network in which convolutional layers are pre-trained in an unsupervised fashion as an autoencoder. The proposed model is trained via triplet loss function that is adjusted for learning feature embeddings in a way that minimizes the distance between embedding-pairs from the same subject and maximizes the distance with those from different subjects, with a margin. In particular, a triplet Siamese matching network using an adaptive margin based hard negative mining has been suggested. The hyper-parameters associated with the training strategy, like the adaptive margin, have been tuned to make the learning more effective on biometric datasets. In extensive experimentation, the proposed network outperforms most of the existing deep learning solutions on three type of typical vein datasets which clearly demonstrates the effectiveness of our proposed method.

* Accepted in 5th IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), 2019, Hyderabad, India

Via

Access Paper or Ask Questions

BrainSegNet : A Segmentation Network for Human Brain Fiber Tractography Data into Anatomically Meaningful Clusters

Oct 14, 2017

Tushar Gupta, Shreyas Malakarjun Patil, Mukkaram Tailor, Daksh Thapar, Aditya Nigam

Figure 1 for BrainSegNet : A Segmentation Network for Human Brain Fiber Tractography Data into Anatomically Meaningful Clusters

Figure 2 for BrainSegNet : A Segmentation Network for Human Brain Fiber Tractography Data into Anatomically Meaningful Clusters

Figure 3 for BrainSegNet : A Segmentation Network for Human Brain Fiber Tractography Data into Anatomically Meaningful Clusters

Figure 4 for BrainSegNet : A Segmentation Network for Human Brain Fiber Tractography Data into Anatomically Meaningful Clusters

Abstract:The segregation of brain fiber tractography data into distinct and anatomically meaningful clusters can help to comprehend the complex brain structure and early investigation and management of various neural disorders. We propose a novel stacked bidirectional long short-term memory(LSTM) based segmentation network, (BrainSegNet) for human brain fiber tractography data classification. We perform a two-level hierarchical classification a) White vs Grey matter (Macro) and b) White matter clusters (Micro). BrainSegNet is trained over three brain tractography data having over 250,000 fibers each. Our experimental evaluation shows that our model achieves state-of-the-art results. We have performed inter as well as intra class testing over three patient's brain tractography data and achieved a high classification accuracy for both macro and micro levels both under intra as well as inter brain testing scenario.

* Deep Learning in Irregular Domains - British Machine Vision Conference (DLID-BMVC)

Via

Access Paper or Ask Questions

VGR-Net: A View Invariant Gait Recognition Network

Oct 13, 2017

Daksh Thapar, Divyansh Aggarwal, Punjal Agarwal, Aditya Nigam

Figure 1 for VGR-Net: A View Invariant Gait Recognition Network

Figure 2 for VGR-Net: A View Invariant Gait Recognition Network

Figure 3 for VGR-Net: A View Invariant Gait Recognition Network

Figure 4 for VGR-Net: A View Invariant Gait Recognition Network

Abstract:Biometric identification systems have become immensely popular and important because of their high reliability and efficiency. However person identification at a distance, still remains a challenging problem. Gait can be seen as an essential biometric feature for human recognition and identification. It can be easily acquired from a distance and does not require any user cooperation thus making it suitable for surveillance. But the task of recognizing an individual using gait can be adversely affected by varying view points making this task more and more challenging. Our proposed approach tackles this problem by identifying spatio-temporal features and performing extensive experimentation and training mechanism. In this paper, we propose a 3-D Convolution Deep Neural Network for person identification using gait under multiple view. It is a 2-stage network, in which we have a classification network that initially identifies the viewing point angle. After that another set of networks (one for each angle) has been trained to identify the person under a particular viewing angle. We have tested this network over CASIA-B publicly available database and have achieved state-of-the-art results. The proposed system is much more efficient in terms of time and space and performing better for almost all angles.

* Accepted in ISBA (IEEE International conference on Identity, Security and Behaviour Analysis)-2018

Via

Access Paper or Ask Questions

UBSegNet: Unified Biometric Region of Interest Segmentation Network

Sep 26, 2017

Ranjeet Ranjan Jha, Daksh Thapar, Shreyas Malakarjun Patil, Aditya Nigam

Figure 1 for UBSegNet: Unified Biometric Region of Interest Segmentation Network

Figure 2 for UBSegNet: Unified Biometric Region of Interest Segmentation Network

Figure 3 for UBSegNet: Unified Biometric Region of Interest Segmentation Network

Figure 4 for UBSegNet: Unified Biometric Region of Interest Segmentation Network

Abstract:Digital human identity management, can now be seen as a social necessity, as it is essentially required in almost every public sector such as, financial inclusions, security, banking, social networking e.t.c. Hence, in today's rampantly emerging world with so many adversarial entities, relying on a single biometric trait is being too optimistic. In this paper, we have proposed a novel end-to-end, Unified Biometric ROI Segmentation Network (UBSegNet), for extracting region of interest from five different biometric traits viz. face, iris, palm, knuckle and 4-slap fingerprint. The architecture of the proposed UBSegNet consists of two stages: (i) Trait classification and (ii) Trait localization. For these stages, we have used a state of the art region based convolutional neural network (RCNN), comprising of three major parts namely convolutional layers, region proposal network (RPN) along with classification and regression heads. The model has been evaluated over various huge publicly available biometric databases. To the best of our knowledge this is the first unified architecture proposed, segmenting multiple biometric traits. It has been tested over around 5000 * 5 = 25,000 images (5000 images per trait) and produces very good results. Our work on unified biometric segmentation, opens up the vast opportunities in the field of multiple biometric traits based authentication systems.

* 4th Asian Conference on Pattern Recognition (ACPR 2017)

Via

Access Paper or Ask Questions