Picture for Sung-Feng Huang

Sung-Feng Huang

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Add code
Sep 25, 2024
Viaarxiv icon

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Add code
Jan 23, 2024
Viaarxiv icon

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Add code
Mar 21, 2023
Viaarxiv icon

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

Add code
Jun 27, 2022
Figure 1 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 2 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 3 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 4 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Viaarxiv icon

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Add code
Nov 07, 2021
Figure 1 for Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Figure 2 for Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Figure 3 for Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Figure 4 for Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Viaarxiv icon

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

Add code
May 31, 2021
Figure 1 for SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Figure 2 for SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Figure 3 for SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Figure 4 for SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Viaarxiv icon

Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization

Add code
Apr 06, 2021
Figure 1 for Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization
Figure 2 for Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization
Figure 3 for Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization
Figure 4 for Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization
Viaarxiv icon

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Add code
Oct 29, 2020
Figure 1 for Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation
Figure 2 for Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation
Figure 3 for Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation
Figure 4 for Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation
Viaarxiv icon

Pretrained Language Model Embryology: The Birth of ALBERT

Add code
Oct 29, 2020
Figure 1 for Pretrained Language Model Embryology: The Birth of ALBERT
Figure 2 for Pretrained Language Model Embryology: The Birth of ALBERT
Figure 3 for Pretrained Language Model Embryology: The Birth of ALBERT
Figure 4 for Pretrained Language Model Embryology: The Birth of ALBERT
Viaarxiv icon

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

Add code
Apr 10, 2019
Figure 1 for From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Figure 2 for From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Figure 3 for From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Figure 4 for From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Viaarxiv icon