Abstract:In the post-pandemic era, wearing face masks has posed great challenge to the ordinary face recognition. In the previous study, researchers has applied pretrained VGG16, and ResNet50 to extract features on the elaborate curated existing masked face recognition (MFR) datasets, RMFRD and SMFRD. To make the model more adaptable to the real world situation where the sample size is smaller and the camera environment has greater changes, we created a more challenging masked face dataset ourselves, by selecting 50 identities with 1702 images from Labelled Faces in the Wild (LFW) Dataset, and simulated face masks through key point detection. The another part of our study is to solve the masked face recognition problem, and we chose models by referring to the former state of the art results, instead of directly using pretrained models, we fine tuned the model on our new dataset and use the last linear layer to do the classification directly. Furthermore, we proposed using data augmentation strategy to further increase the test accuracy, and fine tuned a new networks beyond the former study, one of the most SOTA networks, Inception ResNet v1. The best test accuracy on 50 identity MFR has achieved 95%.
Abstract:In this paper, we introduce a stochastic learning framework for long-term video interpolation. While most existing interpolation models require two reference frames with a short interval, our framework predicts a plausible intermediate sequence between a long interval. Our model consists of two parts: (1) a deterministic estimation to guarantee the spatial and temporal coherency among frames, (2) a stochastic sampling process to generate dynamics from inferred distributions. Experimental results show that our model is able to generate sharp and clear sequences with variations. Moreover, motions in the generated sequence are realistic and able to transfer smoothly from the referenced start frame to the end frame.