Abstract:Lip-based biometric authentication (LBBA) has attracted many researchers during the last decade. The lip is specifically interesting for biometric researchers because it is a twin biometric with the potential to function both as a physiological and a behavioral trait. Although much valuable research was conducted on LBBA, none of them considered the different emotions of the client during the video acquisition step of LBBA, which can potentially affect the client's facial expressions and speech tempo. We proposed a novel network structure called WhisperNetV2, which extends our previously proposed network called WhisperNet. Our proposed network leverages a deep Siamese structure with triplet loss having three identical SlowFast networks as embedding networks. The SlowFast network is an excellent candidate for our task since the fast pathway extracts motion-related features (behavioral lip movements) with a high frame rate and low channel capacity. The slow pathway extracts visual features (physiological lip appearance) with a low frame rate and high channel capacity. Using an open-set protocol, we trained our network using the CREMA-D dataset and acquired an Equal Error Rate (EER) of 0.005 on the test set. Considering that the acquired EER is less than most similar LBBA methods, our method can be considered as a state-of-the-art LBBA method.
Abstract:A low amount of training data for many state-of-the-art deep learning-based Face Recognition (FR) systems causes a marked deterioration in their performance. Although a considerable amount of research has addressed this issue by inventing new data augmentation techniques, using either input space transformations or Generative Adversarial Networks (GAN) for feature space augmentations, these techniques have yet to satisfy expectations. In this paper, we propose a novel method, named the Face Representation Augmentation (FRA) algorithm, for augmenting face datasets. To the best of our knowledge, FRA is the first method that shifts its focus towards manipulating the face embeddings generated by any face representation learning algorithm in order to generate new embeddings representing the same identity and facial emotion but with an altered posture. Extensive experiments conducted in this study convince the efficacy of our methodology and its power to provide noiseless, completely new facial representations to improve the training procedure of any FR algorithm. Therefore, FRA is able to help the recent state-of-the-art FR methods by providing more data for training FR systems. The proposed method, using experiments conducted on the Karolinska Directed Emotional Faces (KDEF) dataset, improves the identity classification accuracies by 9.52 %, 10.04 %, and 16.60 %, in comparison with the base models of MagFace, ArcFace, and CosFace, respectively.