Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yudong Guo

One Shot, One Talk: Whole-body Talking Avatar from a Single Image

Dec 02, 2024

Jun Xiang, Yudong Guo, Leipeng Hu, Boyang Guo, Yancheng Yuan, Juyong Zhang

Abstract:Building realistic and animatable avatars still requires minutes of multi-view or monocular self-rotating videos, and most methods lack precise control over gestures and expressions. To push this boundary, we address the challenge of constructing a whole-body talking avatar from a single image. We propose a novel pipeline that tackles two critical issues: 1) complex dynamic modeling and 2) generalization to novel gestures and expressions. To achieve seamless generalization, we leverage recent pose-guided image-to-video diffusion models to generate imperfect video frames as pseudo-labels. To overcome the dynamic modeling challenge posed by inconsistent and noisy pseudo-videos, we introduce a tightly coupled 3DGS-mesh hybrid avatar representation and apply several key regularizations to mitigate inconsistencies caused by imperfect labels. Extensive experiments on diverse subjects demonstrate that our method enables the creation of a photorealistic, precisely animatable, and expressive whole-body talking avatar from just a single image.

* Project Page: https://ustc3dv.github.io/OneShotOneTalk/

Via

Access Paper or Ask Questions

PICA: Physics-Integrated Clothed Avatar

Jul 07, 2024

Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, Juyong Zhang

Abstract:We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.

* Project page: https://ustc3dv.github.io/PICA/

Via

Access Paper or Ask Questions

FlashAvatar: High-Fidelity Digital Avatar Rendering at 300FPS

Dec 03, 2023

Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang

Abstract:We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/

* Project page: https://ustc3dv.github.io/FlashAvatar/

Via

Access Paper or Ask Questions

CosAvatar: Consistent and Animatable Portrait Video Tuning with Text Prompt

Nov 30, 2023

Haiyao Xiao, Chenglai Zhong, Xuan Gao, Yudong Guo, Juyong Zhang

Abstract:Recently, text-guided digital portrait editing has attracted more and more attentions. However, existing methods still struggle to maintain consistency across time, expression, and view or require specific data prerequisites. To solve these challenging problems, we propose CosAvatar, a high-quality and user-friendly framework for portrait tuning. With only monocular video and text instructions as input, we can produce animatable portraits with both temporal and 3D consistency. Different from methods that directly edit in the 2D domain, we employ a dynamic NeRF-based 3D portrait representation to model both the head and torso. We alternate between editing the video frames' dataset and updating the underlying 3D portrait until the edited frames reach 3D consistency. Additionally, we integrate the semantic portrait priors to enhance the edited results, allowing precise modifications in specified semantic areas. Extensive results demonstrate that our proposed method can not only accurately edit portrait styles or local attributes based on text instructions but also support expressive animation driven by a source video.

* Project page: https://ustc3dv.github.io/CosAvatar/

Via

Access Paper or Ask Questions

MetaHead: An Engine to Create Realistic Digital Head

Apr 03, 2023

Dingyun Zhang, Chenglai Zhong, Yudong Guo, Yang Hong, Juyong Zhang

Figure 1 for MetaHead: An Engine to Create Realistic Digital Head

Figure 2 for MetaHead: An Engine to Create Realistic Digital Head

Figure 3 for MetaHead: An Engine to Create Realistic Digital Head

Figure 4 for MetaHead: An Engine to Create Realistic Digital Head

Abstract:Collecting and labeling training data is one important step for learning-based methods because the process is time-consuming and biased. For face analysis tasks, although some generative models can be used to generate face data, they can only achieve a subset of generation diversity, reconstruction accuracy, 3D consistency, high-fidelity visual quality, and easy editability. One recent related work is the graphics-based generative method, but it can only render low realism head with high computation cost. In this paper, we propose MetaHead, a unified and full-featured controllable digital head engine, which consists of a controllable head radiance field(MetaHead-F) to super-realistically generate or reconstruct view-consistent 3D controllable digital heads and a generic top-down image generation framework LabelHead to generate digital heads consistent with the given customizable feature labels. Experiments validate that our controllable digital head engine achieves the state-of-the-art generation visual quality and reconstruction accuracy. Moreover, the generated labeled data can assist real training data and significantly surpass the labeled data generated by graphics-based methods in terms of training effect.

* Project page: https://ustc3dv.github.io/MetaHead/

Via

Access Paper or Ask Questions

Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

Oct 12, 2022

Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, Juyong Zhang

Figure 1 for Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

Figure 2 for Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

Figure 3 for Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

Figure 4 for Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

Abstract:We present a novel semantic model for human head defined with neural radiance field. The 3D-consistent head model consist of a set of disentangled and interpretable bases, and can be driven by low-dimensional expression coefficients. Thanks to the powerful representation ability of neural radiance field, the constructed model can represent complex facial attributes including hair, wearings, which can not be represented by traditional mesh blendshape. To construct the personalized semantic facial model, we propose to define the bases as several multi-level voxel fields. With a short monocular RGB video as input, our method can construct the subject's semantic facial NeRF model with only ten to twenty minutes, and can render a photo-realistic human head image in tens of miliseconds with a given expression coefficient and view direction. With this novel representation, we apply it to many tasks like facial retargeting and expression editing. Experimental results demonstrate its strong representation ability and training/inference speed. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/NeRFBlendShape/

* ACM Trans. Graph. 41, 6, Article 200 (December 2022), 12 pages
* Accepted by SIGGRAPH Asia 2022 (Journal Track). Project page: https://ustc3dv.github.io/NeRFBlendShape/

Via

Access Paper or Ask Questions

Prior-Guided Multi-View 3D Head Reconstruction

Jul 09, 2021

Xueying Wang, Yudong Guo, Zhongqi Yang, Juyong Zhang

Figure 1 for Prior-Guided Multi-View 3D Head Reconstruction

Figure 2 for Prior-Guided Multi-View 3D Head Reconstruction

Figure 3 for Prior-Guided Multi-View 3D Head Reconstruction

Figure 4 for Prior-Guided Multi-View 3D Head Reconstruction

Abstract:Recovering a 3D head model including the complete face and hair regions is still a challenging problem in computer vision and graphics. In this paper, we consider this problem with a few multi-view portrait images as input. Previous multi-view stereo methods, either based on the optimization strategies or deep learning techniques, suffer from low-frequency geometric structures such as unclear head structures and inaccurate reconstruction in hair regions. To tackle this problem, we propose a prior-guided implicit neural rendering network. Specifically, we model the head geometry with a learnable signed distance field (SDF) and optimize it via an implicit differentiable renderer with the guidance of some human head priors, including the facial prior knowledge, head semantic segmentation information and 2D hair orientation maps. The utilization of these priors can improve the reconstruction accuracy and robustness, leading to a high-quality integrated 3D head model. Extensive ablation studies and comparisons with state-of-the-art methods demonstrate that our method could produce high-fidelity 3D head geometries with the guidance of these priors.

Via

Access Paper or Ask Questions

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Apr 13, 2021

Yang Hong, Juyong Zhang, Boyi Jiang, Yudong Guo, Ligang Liu, Hujun Bao

Figure 1 for StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Figure 2 for StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Figure 3 for StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Figure 4 for StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Abstract:In this paper, we propose StereoPIFu, which integrates the geometric constraints of stereo vision with implicit function representation of PIFu, to recover the 3D shape of the clothed human from a pair of low-cost rectified images. First, we introduce the effective voxel-aligned features from a stereo vision-based network to enable depth-aware reconstruction. Moreover, the novel relative z-offset is employed to associate predicted high-fidelity human depth and occupancy inference, which helps restore fine-level surface details. Second, a network structure that fully utilizes the geometry information from the stereo images is designed to improve the human body reconstruction quality. Consequently, our StereoPIFu can naturally infer the human body's spatial location in camera space and maintain the correct relative position of different parts of the human body, which enables our method to capture human performance. Compared with previous works, our StereoPIFu significantly improves the robustness, completeness, and accuracy of the clothed human reconstruction, which is demonstrated by extensive experimental results.

* Accepted by CVPR2021. Project page: http://crishy1995.github.io/StereoPIFuProject

Via

Access Paper or Ask Questions

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Mar 20, 2021

Yudong Guo, Keyu Chen, Sen Liang, Yongjin Liu, Hujun Bao, Juyong Zhang

Figure 1 for AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Figure 2 for AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Figure 3 for AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Figure 4 for AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Abstract:Generating high-fidelity talking head video by fitting with the input audio sequence is a challenging problem that receives considerable attentions recently. In this paper, we address this problem with the aid of neural scene representation networks. Our method is completely different from existing methods that rely on intermediate representations like 2D landmarks or 3D face models to bridge the gap between audio input and video output. Specifically, the feature of input audio signal is directly fed into a conditional implicit function to generate a dynamic neural radiance field, from which a high-fidelity talking-head video corresponding to the audio signal is synthesized using volume rendering. Another advantage of our framework is that not only the head (with hair) region is synthesized as previous methods did, but also the upper body is generated via two individual neural radiance fields. Experimental results demonstrate that our novel framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.

* Video: https://www.youtube.com/watch?v=TQO2EBYXLyU

Via

Access Paper or Ask Questions

Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Apr 20, 2020

Juyong Zhang, Hongrui Cai, Yudong Guo, Zhuang Peng

Figure 1 for Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Figure 2 for Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Figure 3 for Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Figure 4 for Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model

Abstract:Caricature is an artistic abstraction of the human face by distorting or exaggerating certain facial features, while still retains a likeness with the given face. Due to the large diversity of geometric and texture variations, automatic landmark detection and 3D face reconstruction for caricature is a challenging problem and has rarely been studied before. In this paper, we propose the first automatic method for this task by a novel 3D approach. To this end, we first build a dataset with various styles of 2D caricatures and their corresponding 3D shapes, and then build a parametric model on vertex based deformation space for 3D caricature face. Based on the constructed dataset and the nonlinear parametric model, we propose a neural network based method to regress the 3D face shape and orientation from the input 2D caricature image. Ablation studies and comparison with baseline methods demonstrate the effectiveness of our algorithm design, and extensive experimental results demonstrate that our method works well for various caricatures. Our constructed dataset, source code and trained model are available at https://github.com/Juyong/CaricatureFace.

* The code is available at https://github.com/Juyong/CaricatureFace

Via

Access Paper or Ask Questions