Picture for Zhihao Du

Zhihao Du

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Add code
Dec 13, 2024
Viaarxiv icon

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

Add code
Oct 22, 2024
Figure 1 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 2 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 3 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Figure 4 for Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Viaarxiv icon

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Add code
Oct 09, 2024
Figure 1 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 2 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 3 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 4 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Viaarxiv icon

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Add code
Feb 13, 2024
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Oct 11, 2023
Viaarxiv icon

SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

Add code
Oct 07, 2023
Viaarxiv icon

The second multi-channel multi-party meeting transcription challenge 2.0): A benchmark for speaker-attributed ASR

Add code
Sep 24, 2023
Viaarxiv icon

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

Add code
Sep 14, 2023
Viaarxiv icon

CASA-ASR: Context-Aware Speaker-Attributed ASR

Add code
May 21, 2023
Viaarxiv icon