Picture for Zeyu Jin

Zeyu Jin

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

Add code
Jan 23, 2025
Viaarxiv icon

Code Drift: Towards Idempotent Neural Audio Codecs

Add code
Oct 14, 2024
Viaarxiv icon

DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization

Add code
Oct 14, 2024
Viaarxiv icon

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Add code
Aug 28, 2024
Figure 1 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 2 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 3 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 4 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Viaarxiv icon

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Add code
Aug 28, 2024
Figure 1 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 2 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 3 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 4 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Viaarxiv icon

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Add code
May 24, 2024
Viaarxiv icon

A Closer Look at the Limitations of Instruction Tuning

Add code
Feb 03, 2024
Viaarxiv icon

Efficient Spoken Language Recognition via Multilabel Classification

Add code
Jun 02, 2023
Viaarxiv icon

Audio Similarity is Unreliable as a Proxy for Audio Quality

Add code
Jun 27, 2022
Figure 1 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 2 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 3 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 4 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Viaarxiv icon

Music Enhancement via Image Translation and Vocoding

Add code
Apr 28, 2022
Figure 1 for Music Enhancement via Image Translation and Vocoding
Figure 2 for Music Enhancement via Image Translation and Vocoding
Figure 3 for Music Enhancement via Image Translation and Vocoding
Figure 4 for Music Enhancement via Image Translation and Vocoding
Viaarxiv icon