Abstract:In deep learning, the loss function plays a crucial role in optimizing the network. Many recent innovations in loss techniques have been made, and various margin-based angular loss functions (metric loss) have been designed particularly for face recognition. The concept of transformers is already well-researched and applied in many facets of machine vision. This paper presents a technique for loss evaluation that uses a transformer network as an additive loss in the face recognition domain. The standard metric loss function typically takes the final embedding of the main CNN backbone as its input. Here, we employ a transformer-metric loss, a combined approach that integrates both transformer-loss and metric-loss. This research intends to analyze the transformer behavior on the convolution output when the CNN outcome is arranged in a sequential vector. The transformer encoder takes input from the contextual vectors obtained from the final convolution layer of the network. With this technique, we use transformer loss with various base metric-loss functions to evaluate the effect of the combined loss functions. We observe that such a configuration allows the network to achieve SoTA results on various validation datasets with some limitations. This research expands the role of transformers in the machine vision domain and opens new possibilities for exploring transformers as a loss function.
Abstract:Over the past decade, there has been a steady advancement in enhancing face recognition algorithms leveraging advanced machine learning methods. The role of the loss function is pivotal in addressing face verification problems and playing a game-changing role. These loss functions have mainly explored variations among intra-class or inter-class separation. This research examines the natural phenomenon of facial symmetry in the face verification problem. The symmetry between the left and right hemi faces has been widely used in many research areas in recent decades. This paper adopts this simple approach judiciously by splitting the face image vertically into two halves. With the assumption that the natural phenomena of facial symmetry can enhance face verification methodology, we hypothesize that the two output embedding vectors of split faces must project close to each other in the output embedding space. Inspired by this concept, we penalize the network based on the disparity of embedding of the symmetrical pair of split faces. Symmetrical loss has the potential to minimize minor asymmetric features due to facial expression and lightning conditions, hence significantly increasing the inter-class variance among the classes and leading to more reliable face embedding. This loss function propels any network to outperform its baseline performance across all existing network architectures and configurations, enabling us to achieve SoTA results.