Picture for David Harwath

David Harwath

Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease

Add code
Mar 23, 2026
Viaarxiv icon

Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces

Add code
Mar 13, 2026
Viaarxiv icon

Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Add code
Jan 06, 2026
Viaarxiv icon

FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling

Add code
Dec 16, 2025
Figure 1 for FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling
Figure 2 for FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling
Figure 3 for FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling
Figure 4 for FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling
Viaarxiv icon

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing

Add code
Nov 15, 2025
Viaarxiv icon

Unifying Model and Layer Fusion for Speech Foundation Models

Add code
Nov 11, 2025
Figure 1 for Unifying Model and Layer Fusion for Speech Foundation Models
Figure 2 for Unifying Model and Layer Fusion for Speech Foundation Models
Figure 3 for Unifying Model and Layer Fusion for Speech Foundation Models
Figure 4 for Unifying Model and Layer Fusion for Speech Foundation Models
Viaarxiv icon

Probing the Robustness Properties of Neural Speech Codecs

Add code
May 30, 2025
Viaarxiv icon

Rhapsody: A Dataset for Highlight Detection in Podcasts

Add code
May 26, 2025
Viaarxiv icon

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation

Add code
May 26, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon