Picture for Junjie Pan

Junjie Pan

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Figure 1 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 2 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 3 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 4 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Viaarxiv icon

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Add code
Dec 12, 2022
Figure 1 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 2 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 3 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 4 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Viaarxiv icon

A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation

Add code
Jun 15, 2022
Figure 1 for A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Figure 2 for A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Figure 3 for A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Figure 4 for A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Viaarxiv icon

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Add code
Oct 11, 2021
Figure 1 for Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Figure 2 for Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Figure 3 for Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Figure 4 for Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Viaarxiv icon

A hybrid text normalization system using multi-head self-attention for mandarin

Add code
Nov 11, 2019
Figure 1 for A hybrid text normalization system using multi-head self-attention for mandarin
Figure 2 for A hybrid text normalization system using multi-head self-attention for mandarin
Figure 3 for A hybrid text normalization system using multi-head self-attention for mandarin
Figure 4 for A hybrid text normalization system using multi-head self-attention for mandarin
Viaarxiv icon

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

Add code
Nov 11, 2019
Figure 1 for A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Figure 2 for A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Figure 3 for A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Figure 4 for A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Viaarxiv icon