Picture for Hyeongseop Rha

Hyeongseop Rha

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Add code
Mar 14, 2025
Viaarxiv icon

AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues

Add code
Dec 23, 2024
Viaarxiv icon

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Add code
Sep 02, 2024
Viaarxiv icon

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Add code
Jun 12, 2024
Figure 1 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 2 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 3 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Figure 4 for Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Figure 1 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 2 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 3 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Figure 4 for TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Viaarxiv icon