Picture for Jeong Hun Yeo

Jeong Hun Yeo

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Add code
Sep 02, 2024
Viaarxiv icon

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Add code
Jun 12, 2024
Figure 1 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 2 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 3 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 4 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Viaarxiv icon

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Add code
Feb 23, 2024
Figure 1 for Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Figure 2 for Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Figure 3 for Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Figure 4 for Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Viaarxiv icon

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

Add code
Jan 18, 2024
Figure 1 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 2 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 3 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 4 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Viaarxiv icon

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model

Add code
Sep 15, 2023
Figure 1 for Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model
Figure 2 for Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model
Figure 3 for Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model
Figure 4 for Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model
Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Sep 15, 2023
Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Add code
Aug 18, 2023
Figure 1 for Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Figure 2 for Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Figure 3 for Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Figure 4 for Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Viaarxiv icon

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Add code
Aug 15, 2023
Viaarxiv icon

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

Add code
May 08, 2023
Figure 1 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 2 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 3 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 4 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Viaarxiv icon

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

Add code
Apr 04, 2022
Figure 1 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 2 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 3 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 4 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Viaarxiv icon