Talking Head Generation


Talking head generation is the process of generating videos of a person speaking based on an audio recording of their voice.

LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild

Add code
Jan 30, 2026
Viaarxiv icon

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

Add code
Jan 29, 2026
Viaarxiv icon

Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

Add code
Jan 26, 2026
Viaarxiv icon

Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation

Add code
Jan 19, 2026
Viaarxiv icon

RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation

Add code
Jan 15, 2026
Viaarxiv icon

Generalizable and Animatable 3D Full-Head Gaussian Avatar from a Single Image

Add code
Jan 19, 2026
Viaarxiv icon

Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos

Add code
Jan 14, 2026
Viaarxiv icon

MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement

Add code
Jan 05, 2026
Viaarxiv icon

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Add code
Jan 02, 2026
Viaarxiv icon

DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model

Add code
Dec 30, 2025
Viaarxiv icon