Picture for Joon Son Chung

Joon Son Chung

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Add code
Dec 23, 2025
Viaarxiv icon

LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling

Add code
Dec 23, 2025
Viaarxiv icon

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment

Add code
Dec 08, 2025
Viaarxiv icon

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Add code
May 27, 2025
Figure 1 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 2 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 3 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 4 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Viaarxiv icon

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models

Add code
May 27, 2025
Viaarxiv icon

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

Add code
May 27, 2025
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

SEED: Speaker Embedding Enhancement Diffusion Model

Add code
May 22, 2025
Viaarxiv icon

Test-Time Augmentation for Pose-invariant Face Recognition

Add code
May 14, 2025
Viaarxiv icon

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Add code
May 08, 2025
Figure 1 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 2 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 3 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 4 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Viaarxiv icon