Abstract:A person is usually characterized by descriptors like age, gender, height, cloth type, pattern, color, etc. Such descriptors are known as attributes and/or soft-biometrics. They link the semantic gap between a person's description and retrieval in video surveillance. Retrieving a specific person with the query of semantic description has an important application in video surveillance. Using computer vision to fully automate the person retrieval task has been gathering interest within the research community. However, the Current, trend mainly focuses on retrieving persons with image-based queries, which have major limitations for practical usage. Instead of using an image query, in this paper, we study the problem of person retrieval in video surveillance with a semantic description. To solve this problem, we develop a deep learning-based cascade filtering approach (PeR-ViS), which uses Mask R-CNN [14] (person detection and instance segmentation) and DenseNet-161 [16] (soft-biometric classification). On the standard person retrieval dataset of SoftBioSearch [6], we achieve 0.566 Average IoU and 0.792 %w $IoU > 0.4$, surpassing the current state-of-the-art by a large margin. We hope our simple, reproducible, and effective approach will help ease future research in the domain of person retrieval in video surveillance. The source code and pretrained weights available at https://parshwa1999.github.io/PeR-ViS/.
Abstract:A person is commonly described by attributes like height, build, cloth color, cloth type, and gender. Such attributes are known as soft biometrics. They bridge the semantic gap between human description and person retrieval in surveillance video. The paper proposes a deep learning-based linear filtering approach for person retrieval using height, cloth color, and gender. The proposed approach uses Mask R-CNN for pixel-wise person segmentation. It removes background clutter and provides precise boundary around the person. Color and gender models are fine-tuned using AlexNet and the algorithm is tested on SoftBioSearch dataset. It achieves good accuracy for person retrieval using the semantic query in challenging conditions.
Abstract:The paper presents a technique to improve human detection in still images using deep learning. Our novel method, ViS-HuD, computes visual saliency map from the image. Then the input image is multiplied by the map and product is fed to the Convolutional Neural Network (CNN) which detects humans in the image. A visual saliency map is generated using ML-Net and human detection is carried out using DetectNet. ML-Net is pre-trained on SALICON while, DetectNet is pre-trained on ImageNet database for visual saliency detection and image classification respectively. The CNNs of ViS-HuD were trained on two challenging databases - Penn Fudan and TUD-Brussels Benchmark. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on Penn Fudan Dataset with 91.4% human detection accuracy and it achieves average miss-rate of 53% on the TUDBrussels benchmark.
Abstract:This paper explores the use of Visual Saliency to Classify Age, Gender and Facial Expression for Facial Images. For multi-task classification, we propose our method VEGAC, which is based on Visual Saliency. Using the Deep Multi-level Network [1] and off-the-shelf face detector [2], our proposed method first detects the face in the test image and extracts the CNN predictions on the cropped face. The CNN of VEGAC were fine-tuned on the collected dataset from different benchmarks. Our convolutional neural network (CNN) uses the VGG-16 architecture [3] and is pre-trained on ImageNet for image classification. We demonstrate the usefulness of our method for Age Estimation, Gender Classification, and Facial Expression Classification. We show that we obtain the competitive result with our method on selected benchmarks. All our models and code will be publically available.
Abstract:In this paper, we tackle the classification of gender in facial images with deep learning. Our convolutional neural networks (CNN) use the VGG-16 architecture [1] and are pretrained on ImageNet for image classification. Our proposed method (2^B3^C) first detects the face in the facial image, increases the margin of a detected face by 50%, cropping the face with two boxes three crop schemes (Left, Middle, and Right crop) and extracts the CNN predictions on the cropped schemes. The CNNs of our method is fine-tuned on the Adience and LFW with gender annotations. We show the effectiveness of our method by achieving 90.8% classification on Adience and achieving competitive 95.3% classification accuracy on LFW dataset. In addition, to check the true ability of our method, our gender classification system has a frame rate of 7-10 fps (frames per seconds) on a GPU considering real-time scenarios.
Abstract:We have developed a deep learning network for classification of different flowers. For this, we have used Visual Geometry Group's 102 category flower dataset having 8189 images of 102 different flowers from University of Oxford. The method is basically divided into two parts; Image segmentation and classification. We have compared the performance of two different Convolutional Neural Network architectures GoogLeNet and AlexNet for classification purpose. By keeping the hyper parameters same for both architectures, we have found that the top 1 and top 5 accuracies of GoogLeNet are 47.15% and 69.17% respectively whereas the top 1 and top 5 accuracies of AlexNet are 43.39% and 68.68% respectively. These results are extremely good when compared to random classification accuracy of 0.98%. This method for classification of flowers can be implemented in real time applications and can be used to help botanists for their research as well as camping enthusiasts.
Abstract:Surveillance based on Computer Vision has become a major necessity in current era. Most of the surveillance systems operate on visible light imaging, but performance based on visible light imaging is limited due to some factors like variation in light intensity during the daytime. The matter of concern lies in the need for processing images in low light, such as in the need of nighttime surveillance. In this paper, we have proposed a novel approach for human detection using FLIR(Forward Looking Infrared) camera. As the principle involves sensing based on thermal radiation in the Near IR Region, it is possible to detect Humans from an image captured using a FLIR camera even in low light. The proposed method for human detection involves processing of Thermal images by using HOG (Histogram of Oriented Gradients) feature extraction technique along with some enhancements. The principle of the proposed technique lies in an adaptive background subtraction algorithm, which works in association with the HOG technique. By means of this method, we are able to reduce execution time, precision and some other parameters, which result in improvement of overall accuracy of the human detection system.