Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Stengel

Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Dec 11, 2024

Shengze Wang, Xueting Li, Chao Liu, Matthew Chan, Michael Stengel, Henry Fuchs, Shalini De Mello, Koki Nagano

Figure 1 for Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 2 for Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 3 for Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 4 for Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Abstract:Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built from a single reference image, but fail to faithfully preserve the user's per-frame appearance (e.g., instantaneous facial expression and lighting). As a result, none of these two frameworks is an ideal solution for democratized 3D telepresence. In this work, we address this dilemma and propose a novel solution that maintains both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearance. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction and temporal consistency on in-studio and in-the-wild datasets. https://research.nvidia.com/labs/amri/projects/coherent3d

* arXiv admin note: substantial text overlap with arXiv:2405.00794

Via

Access Paper or Ask Questions

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Dec 11, 2024

Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano, Shalini De Mello, Michael Stengel

Figure 1 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 2 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 3 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 4 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Abstract:Single-image human mesh recovery is a challenging task due to the ill-posed nature of simultaneous body shape, pose, and camera estimation. Existing estimators work well on images taken from afar, but they break down as the person moves close to the camera. Moreover, current methods fail to achieve both accurate 3D pose and 2D alignment at the same time. Error is mainly introduced by inaccurate perspective projection heuristically derived from orthographic parameters. To resolve this long-standing challenge, we present our method BLADE which accurately recovers perspective parameters from a single image without heuristic assumptions. We start from the inverse relationship between perspective distortion and the person's Z-translation Tz, and we show that Tz can be reliably estimated from the image. We then discuss the important role of Tz for accurate human mesh recovery estimated from close-range images. Finally, we show that, once Tz and the 3D human mesh are estimated, one can accurately recover the focal length and full 3D translation. Extensive experiments on standard benchmarks and real-world close-range images show that our method is the first to accurately recover projection parameters from a single image, and consequently attain state-of-the-art accuracy on 3D pose estimation and 2D alignment for a wide range of images. https://research.nvidia.com/labs/amri/projects/blade/

Via

Access Paper or Ask Questions

Coherent 3D Portrait Video Reconstruction via Triplane Fusion

May 01, 2024

Shengze Wang, Xueting Li, Chao Liu, Matthew Chan, Michael Stengel, Josef Spjut, Henry Fuchs, Shalini De Mello, Koki Nagano

Figure 1 for Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 2 for Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 3 for Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Figure 4 for Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Abstract:Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, potentially democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a personalized 3D prior, but fail to faithfully reconstruct the user's per-frame appearance (e.g., facial expressions and lighting). In this work, we recognize the need to maintain both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that fuses a personalized 3D subject prior with per-frame information, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearances. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction accuracy and temporal consistency on in-studio and in-the-wild datasets.

Via

Access Paper or Ask Questions

Real-Time Radiance Fields for Single-Image Portrait View Synthesis

May 03, 2023

Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano

Figure 1 for Real-Time Radiance Fields for Single-Image Portrait View Synthesis

Figure 2 for Real-Time Radiance Fields for Single-Image Portrait View Synthesis

Figure 3 for Real-Time Radiance Fields for Single-Image Portrait View Synthesis

Figure 4 for Real-Time Radiance Fields for Single-Image Portrait View Synthesis

Abstract:We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.

* Project page: https://research.nvidia.com/labs/nxp/lp3d/

Via

Access Paper or Ask Questions

Toward Standardized Classification of Foveated Displays

May 03, 2019

Josef Spjut, Ben Boudaoud, Jonghyun Kim, Trey Greer, Rachel Albert, Michael Stengel, Kaan Aksit, David Luebke

Figure 1 for Toward Standardized Classification of Foveated Displays

Figure 2 for Toward Standardized Classification of Foveated Displays

Figure 3 for Toward Standardized Classification of Foveated Displays

Figure 4 for Toward Standardized Classification of Foveated Displays

Abstract:Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.

* 8 pages, 6 figures, pending review

Via

Access Paper or Ask Questions