Picture for Guangyan Zhang

Guangyan Zhang

Recent Advances in Speech Language Models: A Survey

Add code
Oct 01, 2024
Viaarxiv icon

Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis

Add code
Aug 29, 2024
Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Add code
Jul 31, 2023
Viaarxiv icon

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

Add code
May 27, 2023
Viaarxiv icon

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

Add code
Jun 29, 2022
Figure 1 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 2 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 3 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 4 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Viaarxiv icon

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Add code
Mar 31, 2022
Figure 1 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 2 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 3 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 4 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Viaarxiv icon

Environment Aware Text-to-Speech Synthesis

Add code
Oct 11, 2021
Figure 1 for Environment Aware Text-to-Speech Synthesis
Figure 2 for Environment Aware Text-to-Speech Synthesis
Figure 3 for Environment Aware Text-to-Speech Synthesis
Figure 4 for Environment Aware Text-to-Speech Synthesis
Viaarxiv icon

A study on the efficacy of model pre-training in developing neural text-to-speech system

Add code
Oct 08, 2021
Figure 1 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 2 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 3 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 4 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Viaarxiv icon

Applying the Information Bottleneck Principle to Prosodic Representation Learning

Add code
Aug 05, 2021
Figure 1 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 2 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 3 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 4 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Viaarxiv icon

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

Add code
Jul 06, 2021
Figure 1 for AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Figure 2 for AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Figure 3 for AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Figure 4 for AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Viaarxiv icon