Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Keke He

HiFiVFS: High Fidelity Video Face Swapping

Nov 27, 2024

Xu Chen, Keke He, Junwei Zhu, Yanhao Ge, Wei Li, Chengjie Wang

Figure 1 for HiFiVFS: High Fidelity Video Face Swapping

Figure 2 for HiFiVFS: High Fidelity Video Face Swapping

Figure 3 for HiFiVFS: High Fidelity Video Face Swapping

Figure 4 for HiFiVFS: High Fidelity Video Face Swapping

Abstract:Face swapping aims to generate results that combine the identity from the source with attributes from the target. Existing methods primarily focus on image-based face swapping. When processing videos, each frame is handled independently, making it difficult to ensure temporal stability. From a model perspective, face swapping is gradually shifting from generative adversarial networks (GANs) to diffusion models (DMs), as DMs have been shown to possess stronger generative capabilities. Current diffusion-based approaches often employ inpainting techniques, which struggle to preserve fine-grained attributes like lighting and makeup. To address these challenges, we propose a high fidelity video face swapping (HiFiVFS) framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion (SVD). We build a fine-grained attribute module to extract identity-disentangled and fine-grained attribute features through identity desensitization and adversarial learning. Additionally, We introduce detailed identity injection to further enhance identity similarity. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.

Via

Access Paper or Ask Questions

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

May 21, 2024

Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu

Abstract:Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed for high-precision and high-fidelity face editing for pre-trained diffusion models. We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute. We aim to sufficiently decouple the control of these factors to achieve both tasks in one model. Specifically, our method contains: 1) A Spatial Condition Generator that provides precise landmarks and background; 2) A Plug-and-play Identity Encoder that transfers face embeddings to the text space by a transformer decoder. 3) An Attribute Controller that integrates spatial conditions and detailed attributes. Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned face reenactment/swapping models. Additionally, Face-Adapter seamlessly integrates with various StableDiffusion models.

* Project Page: https://faceadapter.github.io/face-adapter.github.io/

Via

Access Paper or Ask Questions

T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Mar 20, 2024

Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu

Figure 1 for T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Figure 2 for T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Figure 3 for T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Figure 4 for T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Abstract:Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and struggles with the domain gap from synthetic data to real-world images, which is a common challenge for single-view 3D reconstruction methods. To address these challenges, we propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M. Specifically, we use a global Transformer to control the holistic shape and a local Transformer to progressively refine the local geometry details with graph-based point upsampling. To enhance real-world reconstruction, we present the simple yet effective Linear Scale Search (LSS), which serves as prompt tuning during the input preprocessing. Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.

* Received by ICASSP 2024

Via

Access Paper or Ask Questions

A Jointly Learned Deep Architecture for Facial Attribute Analysis and Face Detection in the Wild

Jul 27, 2017

Keke He, Yanwei Fu, Xiangyang Xue

Figure 1 for A Jointly Learned Deep Architecture for Facial Attribute Analysis and Face Detection in the Wild

Figure 2 for A Jointly Learned Deep Architecture for Facial Attribute Analysis and Face Detection in the Wild

Figure 3 for A Jointly Learned Deep Architecture for Facial Attribute Analysis and Face Detection in the Wild

Figure 4 for A Jointly Learned Deep Architecture for Facial Attribute Analysis and Face Detection in the Wild

Abstract:Facial attribute analysis in the real world scenario is very challenging mainly because of complex face variations. Existing works of analyzing face attributes are mostly based on the cropped and aligned face images. However, this result in the capability of attribute prediction heavily relies on the preprocessing of face detector. To address this problem, we present a novel jointly learned deep architecture for both facial attribute analysis and face detection. Our framework can process the natural images in the wild and our experiments on CelebA and LFWA datasets clearly show that the state-of-the-art performance is obtained.

Via

Access Paper or Ask Questions