Abstract:In this paper, we present an approach for combining non-rigid structure-from-motion (NRSfM) with deep generative models,and propose an efficient framework for discovering trajectories in the latent space of 2D GANs corresponding to changes in 3D geometry. Our approach uses recent advances in NRSfM and enables editing of the camera and non-rigid shape information associated with the latent codes without needing to retrain the generator. This formulation provides an implicit dense 3D reconstruction as it enables the image synthesis of novel shapes from arbitrary view angles and non-rigid structure. The method is built upon a sparse backbone, where a neural regressor is first trained to regress parameters describing the cameras and sparse non-rigid structure directly from the latent codes. The latent trajectories associated with changes in the camera and structure parameters are then identified by estimating the local inverse of the regressor in the neighborhood of a given latent code. The experiments show that our approach provides a versatile, systematic way to model, analyze, and edit the geometry and non-rigid structures of faces.
Abstract:In this paper, we use a tensor model based on the Higher-Order Singular Value Decomposition (HOSVD) to discover semantic directions in Generative Adversarial Networks. This is achieved by first embedding a structured facial expression database into the latent space using the e4e encoder. Specifically, we discover directions in latent space corresponding to the six prototypical emotions: anger, disgust, fear, happiness, sadness, and surprise, as well as a direction for yaw rotation. These latent space directions are employed to change the expression or yaw rotation of real face images. We compare our found directions to similar directions found by two other methods. The results show that the visual quality of the resultant edits are on par with State-of-the-Art. It can also be concluded that the tensor-based model is well suited for emotion and yaw editing, i.e., that the emotion or yaw rotation of a novel face image can be robustly changed without a significant effect on identity or other attributes in the images.
Abstract:In this paper, we show that the affine, non-rigid structure-from-motion problem can be solved by rank-one, thus degenerate, basis shapes. It is a natural reformulation of the classic low-rank method by Bregler et al., where it was assumed that the deformable 3D structure is generated by a linear combination of rigid basis shapes. The non-rigid shape will be decomposed into the mean shape and the degenerate shapes, constructed from the right singular vectors of the low-rank decomposition. The right singular vectors are affinely back-projected into the 3D space, and the affine back-projections will also be solved as part of the factorisation. By construction, a direct interpretation for the right singular vectors of the low-rank decomposition will also follow: they can be seen as principal components, hence, the first variant of our method is referred to as Rank-1-PCA. The second variant, referred to as Rank-1-ICA, additionally estimates the orthogonal transform which maps the deformation modes into as statistically independent modes as possible. It has the advantage of pinpointing statistically dependent subspaces related to, for instance, lip movements on human faces. Moreover, in contrast to prior works, no predefined dimensionality for the subspaces is imposed. The experiments on several datasets show that the method achieves better results than the state-of-the-art, it can be computed faster, and it provides an intuitive interpretation for the deformation modes.