Picture for Daxin Tan

Daxin Tan

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data

Add code
Sep 17, 2024
Figure 1 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 2 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 3 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 4 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Viaarxiv icon

Exploring SSL Discrete Tokens for Multilingual ASR

Add code
Sep 13, 2024
Figure 1 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 2 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 3 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 4 for Exploring SSL Discrete Tokens for Multilingual ASR
Viaarxiv icon

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

Add code
Jun 13, 2024
Figure 1 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 2 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 3 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 4 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Viaarxiv icon

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Add code
Dec 07, 2022
Figure 1 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 2 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 3 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 4 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Viaarxiv icon

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

Add code
Apr 12, 2022
Figure 1 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 2 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 3 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 4 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Viaarxiv icon

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Add code
Mar 31, 2022
Figure 1 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 2 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 3 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 4 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Viaarxiv icon

Environment Aware Text-to-Speech Synthesis

Add code
Oct 11, 2021
Figure 1 for Environment Aware Text-to-Speech Synthesis
Figure 2 for Environment Aware Text-to-Speech Synthesis
Figure 3 for Environment Aware Text-to-Speech Synthesis
Figure 4 for Environment Aware Text-to-Speech Synthesis
Viaarxiv icon

A study on the efficacy of model pre-training in developing neural text-to-speech system

Add code
Oct 08, 2021
Figure 1 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 2 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 3 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Figure 4 for A study on the efficacy of model pre-training in developing neural text-to-speech system
Viaarxiv icon

Applying the Information Bottleneck Principle to Prosodic Representation Learning

Add code
Aug 05, 2021
Figure 1 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 2 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 3 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Figure 4 for Applying the Information Bottleneck Principle to Prosodic Representation Learning
Viaarxiv icon