Picture for Xie Chen

Xie Chen

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization

Add code
Dec 13, 2024
Viaarxiv icon

Generative modeling assisted simulation of measurement-altered quantum criticality

Add code
Dec 02, 2024
Viaarxiv icon

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

Add code
Dec 01, 2024
Figure 1 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 2 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 3 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 4 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Viaarxiv icon

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Add code
Nov 26, 2024
Viaarxiv icon

CTC-Assisted LLM-Based Contextual ASR

Add code
Nov 10, 2024
Viaarxiv icon

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

Add code
Oct 22, 2024
Figure 1 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 2 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 3 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 4 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Viaarxiv icon

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Add code
Oct 21, 2024
Viaarxiv icon

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Add code
Oct 12, 2024
Figure 1 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 2 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 3 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 4 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Viaarxiv icon

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

Add code
Oct 12, 2024
Viaarxiv icon

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Add code
Oct 09, 2024
Figure 1 for F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Figure 2 for F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Figure 3 for F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Figure 4 for F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Viaarxiv icon