Picture for Se Jin Park

Se Jin Park

Long-Form Speech Generation with Spoken Language Models

Add code
Dec 24, 2024
Viaarxiv icon

Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor

Add code
Dec 23, 2024
Viaarxiv icon

AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues

Add code
Dec 23, 2024
Viaarxiv icon

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Add code
Jun 12, 2024
Figure 1 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 2 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 3 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 4 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Viaarxiv icon

Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

Add code
Mar 07, 2024
Figure 1 for Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation
Figure 2 for Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation
Figure 3 for Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation
Figure 4 for Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation
Viaarxiv icon

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

Add code
Jan 18, 2024
Figure 1 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 2 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 3 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 4 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Viaarxiv icon

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Add code
Dec 05, 2023
Figure 1 for AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Figure 2 for AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Figure 3 for AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Figure 4 for AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Viaarxiv icon

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Add code
Oct 23, 2023
Viaarxiv icon

Reprogramming Audio-driven Talking Face Synthesis into Text-driven

Add code
Jun 28, 2023
Figure 1 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 2 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 3 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 4 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Viaarxiv icon

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation

Add code
May 31, 2023
Viaarxiv icon