Picture for Changhe Song

Changhe Song

Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

Add code
Jun 18, 2024
Viaarxiv icon

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Mar 29, 2024
Viaarxiv icon

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Add code
Sep 04, 2023
Figure 1 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 2 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 3 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 4 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Viaarxiv icon

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Add code
Sep 04, 2023
Figure 1 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 2 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 3 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 4 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Viaarxiv icon

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Add code
Aug 31, 2023
Viaarxiv icon

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Add code
Aug 19, 2022
Figure 1 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 2 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 3 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 4 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Viaarxiv icon

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

Add code
Apr 03, 2022
Figure 1 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 2 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 3 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 4 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Viaarxiv icon

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

Add code
Mar 31, 2022
Figure 1 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 2 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 3 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 4 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Viaarxiv icon

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

Add code
Mar 31, 2022
Figure 1 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 2 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 3 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 4 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Viaarxiv icon

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Add code
Mar 24, 2022
Figure 1 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 2 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 3 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 4 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Viaarxiv icon