Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhan Qu

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Apr 28, 2024

Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv(+1 more)

Figure 1 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Figure 2 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Figure 3 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Figure 4 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Abstract:Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip motion generation. Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose, delivering stable and realistic rendered videos. Extensive experimental results suggest that GaussianTalker outperforms existing state-of-the-art methods in talking head synthesis, delivering precise lip synchronization and exceptional visual quality. Our method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU, significantly exceeding the threshold for real-time rendering performance, and can potentially be deployed on other hardware platforms.

* https://yuhongyun777.github.io/GaussianTalker/

Via

Access Paper or Ask Questions

GreeDy and CoDy: Counterfactual Explainers for Dynamic Graphs

Mar 25, 2024

Zhan Qu, Daniel Gomm, Michael Färber

Figure 1 for GreeDy and CoDy: Counterfactual Explainers for Dynamic Graphs

Figure 2 for GreeDy and CoDy: Counterfactual Explainers for Dynamic Graphs

Figure 3 for GreeDy and CoDy: Counterfactual Explainers for Dynamic Graphs

Figure 4 for GreeDy and CoDy: Counterfactual Explainers for Dynamic Graphs

Abstract:Temporal Graph Neural Networks (TGNNs), crucial for modeling dynamic graphs with time-varying interactions, face a significant challenge in explainability due to their complex model structure. Counterfactual explanations, crucial for understanding model decisions, examine how input graph changes affect outcomes. This paper introduces two novel counterfactual explanation methods for TGNNs: GreeDy (Greedy Explainer for Dynamic Graphs) and CoDy (Counterfactual Explainer for Dynamic Graphs). They treat explanations as a search problem, seeking input graph alterations that alter model predictions. GreeDy uses a simple, greedy approach, while CoDy employs a sophisticated Monte Carlo Tree Search algorithm. Experiments show both methods effectively generate clear explanations. Notably, CoDy outperforms GreeDy and existing factual methods, with up to 59\% higher success rate in finding significant counterfactual inputs. This highlights CoDy's potential in clarifying TGNN decision-making, increasing their transparency and trustworthiness in practice.

Via

Access Paper or Ask Questions

Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

May 19, 2022

Zhuoling Li, Zhan Qu, Yang Zhou, Jianzhuang Liu, Haoqian Wang, Lihui Jiang

Figure 1 for Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

Figure 2 for Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

Figure 3 for Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

Figure 4 for Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

Abstract:As an inherently ill-posed problem, depth estimation from single images is the most challenging part of monocular 3D object detection (M3OD). Many existing methods rely on preconceived assumptions to bridge the missing spatial information in monocular images, and predict a sole depth value for every object of interest. However, these assumptions do not always hold in practical applications. To tackle this problem, we propose a depth solving system that fully explores the visual clues from the subtasks in M3OD and generates multiple estimations for the depth of each target. Since the depth estimations rely on different assumptions in essence, they present diverse distributions. Even if some assumptions collapse, the estimations established on the remaining assumptions are still reliable. In addition, we develop a depth selection and combination strategy. This strategy is able to remove abnormal estimations caused by collapsed assumptions, and adaptively combine the remaining estimations into a single one. In this way, our depth solving system becomes more precise and robust. Exploiting the clues from multiple subtasks of M3OD and without introducing any extra information, our method surpasses the current best method by more than 20% relatively on the Moderate level of test split in the KITTI 3D object detection benchmark, while still maintaining real-time efficiency.

* This paper has been accepted as an oral presentation of CVPR2022

Via

Access Paper or Ask Questions

Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

May 28, 2021

Zhan Qu, Huan Jin, Yang Zhou, Zhen Yang, Wei Zhang

Figure 1 for Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Figure 2 for Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Figure 3 for Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Figure 4 for Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Abstract:Mainstream lane marker detection methods are implemented by predicting the overall structure and deriving parametric curves through post-processing. Complex lane line shapes require high-dimensional output of CNNs to model global structures, which further increases the demand for model capacity and training data. In contrast, the locality of a lane marker has finite geometric variations and spatial coverage. We propose a novel lane marker detection solution, FOLOLane, that focuses on modeling local patterns and achieving prediction of global structures in a bottom-up manner. Specifically, the CNN models lowcomplexity local patterns with two separate heads, the first one predicts the existence of key points, and the second refines the location of key points in the local range and correlates key points of the same lane line. The locality of the task is consistent with the limited FOV of the feature in CNN, which in turn leads to more stable training and better generalization. In addition, an efficiency-oriented decoding algorithm was proposed as well as a greedy one, which achieving 36% runtime gains at the cost of negligible performance degradation. Both of the two decoders integrated local information into the global geometry of lane markers. In the absence of a complex network architecture design, the proposed method greatly outperforms all existing methods on public datasets while achieving the best state-of-the-art results and real-time processing simultaneously.

* Accepted to CVPR 2021

Via

Access Paper or Ask Questions