Picture for Joanna Hong

Joanna Hong

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Add code
Jun 12, 2024
Viaarxiv icon

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Add code
Oct 23, 2023
Viaarxiv icon

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Add code
Aug 15, 2023
Viaarxiv icon

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Add code
Mar 20, 2023
Viaarxiv icon

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

Add code
Feb 17, 2023
Viaarxiv icon

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

Add code
Nov 03, 2022
Figure 1 for SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Figure 2 for SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Figure 3 for SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Figure 4 for SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Viaarxiv icon

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

Add code
Jul 13, 2022
Figure 1 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 2 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 3 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 4 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Viaarxiv icon

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection

Add code
Jun 15, 2022
Figure 1 for VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Figure 2 for VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Figure 3 for VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Figure 4 for VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Viaarxiv icon

Lip to Speech Synthesis with Visual Context Attentional GAN

Add code
Apr 04, 2022
Figure 1 for Lip to Speech Synthesis with Visual Context Attentional GAN
Figure 2 for Lip to Speech Synthesis with Visual Context Attentional GAN
Figure 3 for Lip to Speech Synthesis with Visual Context Attentional GAN
Figure 4 for Lip to Speech Synthesis with Visual Context Attentional GAN
Viaarxiv icon

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

Add code
Apr 04, 2022
Figure 1 for Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Figure 2 for Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Figure 3 for Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Figure 4 for Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Viaarxiv icon