Talking Head Generation


Talking head generation is the process of generating videos of a person speaking based on an audio recording of their voice.

Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis

Add code
Apr 18, 2025
Viaarxiv icon

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Add code
Apr 03, 2025
Viaarxiv icon

OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication

Add code
Apr 03, 2025
Viaarxiv icon

Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation

Add code
Apr 08, 2025
Viaarxiv icon

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

Add code
Apr 08, 2025
Viaarxiv icon

Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation

Add code
Apr 08, 2025
Viaarxiv icon

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Add code
Mar 27, 2025
Viaarxiv icon

Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis

Add code
Mar 28, 2025
Viaarxiv icon

MoCha: Towards Movie-Grade Talking Character Synthesis

Add code
Mar 30, 2025
Viaarxiv icon

EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters

Add code
Mar 25, 2025
Viaarxiv icon