Picture for Joon Son Chung

Joon Son Chung

UNMIXX: Untangling Highly Correlated Singing Voices Mixtures

Add code
Jan 19, 2026
Viaarxiv icon

FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference

Add code
Jan 19, 2026
Viaarxiv icon

LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence

Add code
Jan 08, 2026
Viaarxiv icon

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Add code
Dec 23, 2025
Viaarxiv icon

LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling

Add code
Dec 23, 2025
Figure 1 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 2 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 3 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 4 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Viaarxiv icon

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment

Add code
Dec 08, 2025
Figure 1 for Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
Figure 2 for Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
Figure 3 for Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
Figure 4 for Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
Viaarxiv icon

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Add code
May 27, 2025
Figure 1 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 2 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 3 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 4 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Viaarxiv icon

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

Add code
May 27, 2025
Viaarxiv icon

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models

Add code
May 27, 2025
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon