Picture for Xie Chen

Xie Chen

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

Add code
Oct 22, 2024
Viaarxiv icon

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Add code
Oct 21, 2024
Viaarxiv icon

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

Add code
Oct 12, 2024
Viaarxiv icon

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Add code
Oct 12, 2024
Figure 1 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 2 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 3 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 4 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Viaarxiv icon

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Add code
Oct 09, 2024
Viaarxiv icon

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Add code
Sep 29, 2024
Figure 1 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Figure 2 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Figure 3 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Figure 4 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Viaarxiv icon

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Add code
Sep 13, 2024
Figure 1 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 2 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 3 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 4 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Viaarxiv icon

Exploring SSL Discrete Tokens for Multilingual ASR

Add code
Sep 13, 2024
Viaarxiv icon

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Add code
Sep 03, 2024
Viaarxiv icon

Progressive Residual Extraction based Pre-training for Speech Representation Learning

Add code
Aug 31, 2024
Viaarxiv icon