Picture for Daxin Tan

Daxin Tan

PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs

Add code
Jan 23, 2026
Viaarxiv icon

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

Add code
Jan 15, 2026
Viaarxiv icon

AEQ-Bench: Measuring Empathy of Omni-Modal Large Models

Add code
Jan 15, 2026
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data

Add code
Sep 17, 2024
Figure 1 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 2 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 3 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 4 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Viaarxiv icon

Exploring SSL Discrete Tokens for Multilingual ASR

Add code
Sep 13, 2024
Figure 1 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 2 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 3 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 4 for Exploring SSL Discrete Tokens for Multilingual ASR
Viaarxiv icon

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

Add code
Jun 13, 2024
Figure 1 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 2 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 3 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Figure 4 for ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Viaarxiv icon

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Add code
Dec 07, 2022
Figure 1 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 2 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 3 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Figure 4 for Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Viaarxiv icon

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

Add code
Apr 12, 2022
Figure 1 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 2 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 3 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Figure 4 for CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Viaarxiv icon

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Add code
Mar 31, 2022
Figure 1 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 2 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 3 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Figure 4 for Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Viaarxiv icon