Picture for Yusuke Yasuda

Yusuke Yasuda

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

Add code
Mar 10, 2024
Figure 1 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 2 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 3 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 4 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Viaarxiv icon

Preference-based training framework for automatic speech quality assessment using deep neural network

Add code
Aug 29, 2023
Viaarxiv icon

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Add code
Dec 16, 2022
Viaarxiv icon

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Add code
Dec 16, 2022
Viaarxiv icon

ESPnet2-TTS: Extending the Edge of TTS Research

Add code
Oct 15, 2021
Figure 1 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 2 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 3 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 4 for ESPnet2-TTS: Extending the Edge of TTS Research
Viaarxiv icon

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

Add code
Nov 10, 2020
Figure 1 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 2 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 3 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 4 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Viaarxiv icon

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

Add code
Oct 20, 2020
Figure 1 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 2 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 3 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 4 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Viaarxiv icon

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Add code
May 20, 2020
Figure 1 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 2 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 3 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 4 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Viaarxiv icon

Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment

Add code
Oct 28, 2019
Figure 1 for Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Figure 2 for Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Figure 3 for Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Figure 4 for Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Viaarxiv icon

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

Add code
Aug 30, 2019
Figure 1 for Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Figure 2 for Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Figure 3 for Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Figure 4 for Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Viaarxiv icon