Abstract:Typical neural network architectures used for image segmentation cannot be changed without further training. This is quite limiting as the network might not only be executed on a powerful server, but also on a mobile or edge device. Adaptive neural networks offer a solution to the problem by allowing certain adaptivity after the training process is complete. In this work for the first time, we apply Post-Train Adaptive (PTA) approach to the task of image segmentation. We introduce U-Net+PTA neural network, which can be trained once, and then adapted to different device performance categories. The two key components of the approach are PTA blocks and PTA-sampling training strategy. The post-train configuration can be done at runtime on any inference device including mobile. Also, the PTA approach has allowed to improve image segmentation Dice score on the CamVid dataset. The final trained model can be switched at runtime between 6 PTA configurations, which differ by inference time and quality. Importantly, all of the configurations have better quality than the original U-Net (No PTA) model.
Abstract:Many applications require high accuracy of neural networks as well as low latency and user data privacy guaranty. Face anti-spoofing is one of such tasks. However, a single model might not give the best results for different device performance categories, while training multiple models is time consuming. In this work we present Post-Train Adaptive (PTA) block. Such a block is simple in structure and offers a drop-in replacement for the MobileNetV2 Inverted Residual block. The PTA block has multiple branches with different computation costs. The branch to execute can be selected on-demand and at runtime; thus, offering different inference times and configuration capability for multiple device tiers. Crucially, the model is trained once and can be easily reconfigured after training, even directly on a mobile device. In addition, the proposed approach shows substantially better overall performance in comparison to the original MobileNetV2 as tested on CelebA-Spoof dataset. Different PTA block configurations are sampled at training time, which also decreases overall wall-clock time needed to train the model. While we present computational results for the anti-spoofing problem, the MobileNetV2 with PTA blocks is applicable to any problem solvable with convolutional neural networks, which makes the results presented practically significant.
Abstract:Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is Model-Agnostic Meta-Learning (MAML). However, the key procedure of adaptation to new tasks in MAML is quite slow. In this work we propose an improvement to MAML meta-learning algorithm. We introduce Lambda patterns by which we restrict which weight are updated in the network during the adaptation phase. This makes it possible to skip certain gradient computations. The fastest pattern is selected given an allowed quality degradation threshold parameter. In certain cases, quality improvement is possible by a careful pattern selection. The experiments conducted have shown that via Lambda adaptation pattern selection, it is possible to significantly improve the MAML method in the following areas: adaptation time has been decreased by a factor of 3 with minimal accuracy loss; accuracy for one-step adaptation has been substantially improved.
Abstract:In many practical cases face detection on smartphones or other highly portable devices is a necessity. Applications include mobile face access control systems, driver status tracking, emotion recognition, etc. Mobile devices have limited processing power and should have long-enough battery life even with face detection application running. Thus, striking the right balance between algorithm quality and complexity is crucial. In this work we adapt 5 algorithms to mobile. These algorithms are based on handcrafted or neural-network-based features and include: Viola-Jones (Haar cascade), LBP, HOG, MTCNN, BlazeFace. We analyze inference time of these algorithms on different devices with different input image resolutions. We provide guidance, which algorithms are the best fit for mobile face access control systems and potentially other mobile applications. Interestingly, we note that cascaded algorithms perform faster on scenes without faces, while BlazeFace is slower on empty scenes. Exploiting this behavior might be useful in practice.
Abstract:In this paper we survey and analyze modern neural-network-based facial landmark detection algorithms. We focus on approaches that have led to a significant increase in quality over the past few years on datasets with large pose and emotion variability, high levels of face occlusions - all of which are typical in real-world scenarios. We summarize the improvements into categories, provide quality comparison on difficult and modern in-the-wild datasets: 300-W, AFLW, WFLW, COFW. Additionally, we compare algorithm speed on CPU, GPU and Mobile devices. For completeness, we also briefly touch on established methods with open implementations available. Besides, we cover applications and vulnerabilities of the landmark detection algorithms. Based on which, we raise problems that as we hope will lead to further algorithm improvements in future.
Abstract:Neural networks are now actively being used for computer vision tasks in security critical areas such as robotics, face recognition, autonomous vehicles yet their safety is under question after the discovery of adversarial attacks. In this paper we develop simplified adversarial attack algorithms based on a scoping idea, which enables execution of fast adversarial attacks that minimize structural image quality (SSIM) loss, allows performing efficient transfer attacks with low target inference network call count and opens a possibility of an attack using pen-only drawings on a paper for the MNIST handwritten digit dataset. The presented adversarial attack analysis and the idea of attack scoping can be easily expanded to different datasets, thus making the paper's results applicable to a wide range of practical tasks.