Picture for Yuekai Zhang

Yuekai Zhang

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Add code
Dec 12, 2024
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Oct 27, 2023
Viaarxiv icon

LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

Add code
Nov 02, 2022
Viaarxiv icon

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

Add code
Nov 01, 2022
Viaarxiv icon

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Add code
Nov 29, 2021
Figure 1 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Figure 2 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Figure 3 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Figure 4 for ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Viaarxiv icon

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

Add code
Apr 06, 2021
Figure 1 for SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Figure 2 for SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Figure 3 for SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Figure 4 for SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Viaarxiv icon

Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices

Add code
Feb 07, 2021
Figure 1 for Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Figure 2 for Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Figure 3 for Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Figure 4 for Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Viaarxiv icon

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

Add code
Oct 22, 2020
Figure 1 for Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss
Figure 2 for Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss
Figure 3 for Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss
Figure 4 for Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss
Viaarxiv icon