Picture for Hongyu Gong

Hongyu Gong

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

Add code
Oct 31, 2024
Viaarxiv icon

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Add code
Jul 03, 2024
Viaarxiv icon

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

Add code
Jun 04, 2024
Viaarxiv icon

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought

Add code
May 30, 2024
Viaarxiv icon

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

Add code
Mar 19, 2024
Viaarxiv icon

An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis

Add code
Mar 19, 2024
Viaarxiv icon

Seamless: Multilingual Expressive and Streaming Speech Translation

Add code
Dec 08, 2023
Figure 1 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 2 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 3 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 4 for Seamless: Multilingual Expressive and Streaming Speech Translation
Viaarxiv icon

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Add code
Aug 23, 2023
Figure 1 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 2 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 3 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 4 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Viaarxiv icon

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Add code
Jul 17, 2023
Viaarxiv icon

Pre-training for Speech Translation: CTC Meets Optimal Transport

Add code
Jan 27, 2023
Viaarxiv icon