Picture for Keon Lee

Keon Lee

DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

Add code
Jun 17, 2024
Viaarxiv icon

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

Add code
Apr 03, 2024
Viaarxiv icon

Mini-Batch Optimization of Contrastive Loss

Add code
Jul 12, 2023
Viaarxiv icon

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

Add code
Jul 06, 2023
Figure 1 for Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Figure 2 for Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Figure 3 for Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Figure 4 for Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Viaarxiv icon

RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech

Add code
Oct 26, 2022
Viaarxiv icon

DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech

Add code
Jul 03, 2022
Figure 1 for DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Figure 2 for DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Figure 3 for DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Figure 4 for DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Viaarxiv icon

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Add code
Apr 04, 2021
Figure 1 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 2 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 3 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 4 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Viaarxiv icon