Abstract:This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. Traditionally, training of DCNNs is purely data-driven. This often results in learning features of the data that are only coincidentally correlated with class labels. Human saliency can guide network training using our proposed new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions. This mechanism produces DCNNs achieving higher accuracy and generalization compared to using the same training data without human salience. Experimental results demonstrate that CYBORG applies across multiple network architectures and problem domains (detection of synthetic faces, iris presentation attacks and anomalies in chest X-rays), while requiring significantly less data than training without human saliency guidance. Visualizations show that CYBORG-trained models' saliency is more consistent across independent training runs than traditionally-trained models, and also in better agreement with humans. To lower the cost of collecting human annotations, we also explore using deep learning to provide automated annotations. CYBORG training of CNNs addresses important issues such as reducing the appetite for large training sets, increasing interpretability, and reducing fragility by generalizing better to new types of data.
Abstract:This paper proposes a framework for a privacy-safe iris presentation attack detection (PAD) method, designed solely with synthetically-generated, identity-leakage-free iris images. Once trained, the method is evaluated in a classical way using state-of-the-art iris PAD benchmarks. We designed two generative models for the synthesis of ISO/IEC 19794-6-compliant iris images. The first model synthesizes bona fide-looking samples. To avoid ``identity leakage,'' the generated samples that accidentally matched those used in the model's training were excluded. The second model synthesizes images of irises with textured contact lenses and is conditioned by a given contact lens brand to have better control over textured contact lens appearance when forming the training set. Our experiments demonstrate that models trained solely on synthetic data achieve a lower but still reasonable performance when compared to solutions trained with iris images collected from human subjects. This is the first-of-its-kind attempt to use solely synthetic data to train a fully-functional iris PAD solution, and despite the performance gap between regular and the proposed methods, this study demonstrates that with the increasing fidelity of generative models, creating such privacy-safe iris PAD methods may be possible. The source codes and generative models trained for this work are offered along with the paper.
Abstract:Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
Abstract:Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.
Abstract:Post-mortem iris recognition is an emerging application of iris-based human identification in a forensic setup. One factor that may be useful in conditioning iris recognition methods is the tissue decomposition level, which is correlated with the post-mortem interval (PMI), i.g., the number of hours that have elapsed since death. PMI, however, is not always available, and its precise estimation remains one of the core challenges in forensic examination. This paper presents the first known to us method of PMI estimation directly from forensic iris images. To assess the feasibility of the iris-based PMI estimation, convolutional neural networks-based models (VGG19, DenseNet121, ResNet152, and Inception_v3) were trained to predict the PMI from (a) near-infrared (NIR), (b) visible (RGB), and (c) multispectral forensic iris images. Models were evaluated following a 10-fold cross-validation in (S1) sample-disjoint, (S2) subject-disjoint, and (S3) cross-dataset scenarios. We found that using the multispectral data offers a spectacularly low mean absolute error (MAE) of approximately 3.5 hours in scenario (S1), a bit worse MAE of approximately 17.5 hours in scenario (S2), and an MAE of approximately 69.0 hours of in the scenario (S3). This suggests that if the environmental conditions are favorable (e.g., bodies are kept in low temperatures), forensic iris images provide features that are indicative of the PMI and can be automatically estimated. The source codes and model weights are made available with the paper.
Abstract:Visible-light cameras can capture subtle physiological biomarkers without physical contact with the subject. We present the Multi-Site Physiological Monitoring (MSPM) dataset, which is the first dataset collected to support the study of simultaneous camera-based vital signs estimation from multiple locations on the body. MSPM enables research on remote photoplethysmography (rPPG), respiration rate, and pulse transit time (PTT); it contains ground-truth measurements of pulse oximetry (at multiple body locations) and blood pressure using contacting sensors. We provide thorough experiments demonstrating the suitability of MSPM to support research on rPPG, respiration rate, and PTT. Cross-dataset rPPG experiments reveal that MSPM is a challenging yet high quality dataset, with intra-dataset pulse rate mean absolute error (MAE) below 4 beats per minute (BPM), and cross-dataset pulse rate MAE below 2 BPM in certain cases. Respiration experiments find a MAE of 1.09 breaths per minute by extracting motion features from the chest. PTT experiments find that across the pairs of different body sites, there is high correlation between remote PTT and contact-measured PTT, which facilitates the possibility for future camera-based PTT research.
Abstract:Synthesis of same-identity biometric iris images, both for existing and non-existing identities while preserving the identity across a wide range of pupil sizes, is complex due to intricate iris muscle constriction mechanism, requiring a precise model of iris non-linear texture deformations to be embedded into the synthesis pipeline. This paper presents the first method of fully data-driven, identity-preserving, pupil size-varying s ynthesis of iris images. This approach is capable of synthesizing images of irises with different pupil sizes representing non-existing identities as well as non-linearly deforming the texture of iris images of existing subjects given the segmentation mask of the target iris image. Iris recognition experiments suggest that the proposed deformation model not only preserves the identity when changing the pupil size but offers better similarity between same-identity iris samples with significant differences in pupil size, compared to state-of-the-art linear and non-linear (bio-mechanical-based) iris deformation models. Two immediate applications of the proposed approach are: (a) synthesis of, or enhancement of the existing biometric datasets for iris recognition, mimicking those acquired with iris sensors, and (b) helping forensic human experts in examining iris image pairs with significant differences in pupil dilation. Source codes and weights of the models are made available with the paper.
Abstract:Post-mortem iris recognition is an emerging application of iris-based human identification in a forensic setup, able to correctly identify deceased subjects even three weeks post-mortem. This technique thus is considered as an important component of future forensic toolkits. The current advancements in this field are seriously slowed down by exceptionally difficult data collection, which can happen in mortuary conditions, at crime scenes, or in ``body farm'' facilities. This paper makes a novel contribution to facilitate progress in post-mortem iris recognition by offering a conditional StyleGAN-based iris synthesis model, trained on the largest-available dataset of post-mortem iris samples acquired from more than 350 subjects, generating -- through appropriate exploration of StyleGAN latent space -- multiple within-class (same identity) and between-class (different new identities) post-mortem iris images, compliant with ISO/IEC 29794-6, and with decomposition deformations controlled by the requested PMI (post mortem interval). Besides an obvious application to enhance the existing, very sparse, post-mortem iris datasets to advance -- among others -- iris presentation attack endeavors, we anticipate it may be useful to generate samples that would expose professional forensic human examiners to never-seen-before deformations for various PMIs, increasing their training effectiveness. The source codes and model weights are made available with the paper.
Abstract:Incorporating human salience into the training of CNNs has boosted performance in difficult tasks such as biometric presentation attack detection. However, collecting human annotations is a laborious task, not to mention the questions of how and where (in the model architecture) to efficiently incorporate this information into model's training once annotations are obtained. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr iris pResentation attack detection), which addresses both of these issues through two unique rounds of training. First, we train an autoencoder to learn human saliency maps given an input iris image (both real and fake examples). Once this representation is learned, we utilize the trained autoencoder in two different ways: (a) as a pre-trained backbone for an iris presentation attack detector, and (b) as a human-inspired annotator of salient features on unknown data. We show that MENTOR's benefits are threefold: (a) significant boost in iris PAD performance when using the human perception-trained encoder's weights compared to general-purpose weights (e.g. ImageNet-sourced, or random), (b) capability of generating infinite number of human-like saliency maps for unseen iris PAD samples to be used in any human saliency-guided training paradigm, and (c) increase in efficiency of iris PAD model training. Sources codes and weights are offered along with the paper.
Abstract:This paper describes the results of the 2023 edition of the ''LivDet'' series of iris presentation attack detection (PAD) competitions. New elements in this fifth competition include (1) GAN-generated iris images as a category of presentation attack instruments (PAI), and (2) an evaluation of human accuracy at detecting PAI as a reference benchmark. Clarkson University and the University of Notre Dame contributed image datasets for the competition, composed of samples representing seven different PAI categories, as well as baseline PAD algorithms. Fraunhofer IGD, Beijing University of Civil Engineering and Architecture, and Hochschule Darmstadt contributed results for a total of eight PAD algorithms to the competition. Accuracy results are analyzed by different PAI types, and compared to human accuracy. Overall, the Fraunhofer IGD algorithm, using an attention-based pixel-wise binary supervision network, showed the best-weighted accuracy results (average classification error rate of 37.31%), while the Beijing University of Civil Engineering and Architecture's algorithm won when equal weights for each PAI were given (average classification rate of 22.15%). These results suggest that iris PAD is still a challenging problem.