Picture for Jae-Min Kim

Jae-Min Kim

A Two-Step Approach for Data-Efficient French Pronunciation Learning

Add code
Oct 08, 2024
Figure 1 for A Two-Step Approach for Data-Efficient French Pronunciation Learning
Figure 2 for A Two-Step Approach for Data-Efficient French Pronunciation Learning
Figure 3 for A Two-Step Approach for Data-Efficient French Pronunciation Learning
Figure 4 for A Two-Step Approach for Data-Efficient French Pronunciation Learning
Viaarxiv icon

Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model

Add code
Jun 05, 2023
Viaarxiv icon

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Add code
Oct 28, 2022
Viaarxiv icon

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

Add code
Jul 01, 2022
Figure 1 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 2 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 3 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 4 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Viaarxiv icon

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

Add code
Jun 30, 2022
Figure 1 for TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Figure 2 for TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Figure 3 for TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Figure 4 for TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Viaarxiv icon

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Add code
Apr 21, 2022
Figure 1 for Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Figure 2 for Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Figure 3 for Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Figure 4 for Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Viaarxiv icon

Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss

Add code
Jan 19, 2021
Figure 1 for Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Figure 2 for Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Figure 3 for Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Figure 4 for Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Viaarxiv icon

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

Add code
Oct 27, 2020
Figure 1 for Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Figure 2 for Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Figure 3 for Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Figure 4 for Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Viaarxiv icon

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Add code
Oct 25, 2019
Figure 1 for Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Figure 2 for Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Figure 3 for Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Figure 4 for Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Viaarxiv icon

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

Add code
May 21, 2019
Figure 1 for Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Figure 2 for Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Figure 3 for Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Figure 4 for Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Viaarxiv icon