Picture for Jianhao Ye

Jianhao Ye

S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamlessly Speech-Text Alignment and Streaming Speech Decoder

Add code
Jun 16, 2025
Viaarxiv icon

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech

Add code
May 20, 2025
Viaarxiv icon

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech

Add code
Feb 05, 2025
Viaarxiv icon

CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching

Add code
Nov 04, 2024
Viaarxiv icon

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

Add code
Oct 18, 2024
Viaarxiv icon

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling

Add code
Oct 02, 2024
Figure 1 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 2 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 3 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 4 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Viaarxiv icon

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

Add code
Sep 18, 2024
Viaarxiv icon

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Add code
Feb 22, 2022
Figure 1 for Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Figure 2 for Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Figure 3 for Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Viaarxiv icon