Abstract:Performing facial expression transfer under one-shot setting has been increasing in popularity among research community with a focus on precise control of expressions. Existing techniques showcase compelling results in perceiving expressions, but they lack robustness with extreme head poses. They also struggle to accurately reconstruct background details, thus hindering the realism. In this paper, we propose a novel warping technology which integrates the advantages of both 2D and 3D methods to achieve robust face re-enactment. We generate dense 3D facial flow fields in feature space to warp an input image based on target expressions without depth information. This enables explicit 3D geometric control for re-enacting misaligned source and target faces. We regularize the motion estimation capability of the 3D flow prediction network through proposed "Cyclic warp loss" by converting warped 3D features back into 2D RGB space. To ensure the generation of finer facial region with natural-background, our framework only renders the facial foreground region first and learns to inpaint the blank area which needs to be filled due to source face translation, thus reconstructing the detailed background without any unwanted pixel motion. Extensive evaluation reveals that our method outperforms state-of-the-art techniques in rendering artifact-free facial images.
Abstract:In recent advances of deep generative models, face reenactment -manipulating and controlling human face, including their head movement-has drawn much attention for its wide range of applicability. Despite its strong expressiveness, it is inevitable that the models fail to reconstruct or accurately generate unseen side of the face of a given single reference image. Most of existing methods alleviate this problem by learning appearances of human faces from large amount of data and generate realistic texture at inference time. Rather than completely relying on what generative models learn, we show that simple extension by using multiple reference images significantly improves generation quality. We show this by 1) conducting the reconstruction task on publicly available dataset, 2) conducting facial motion transfer on our original dataset which consists of multi-person's head movement video sequences, and 3) using a newly proposed evaluation metric to validate that our method achieves better quantitative results.
Abstract:While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspective, with emphasis on usability and compatibility as its core design principles. We elaborate on each of our design principles and its merits, and validate our attempts via experiments.