Picture for Zeyu Jin

Zeyu Jin

Code Drift: Towards Idempotent Neural Audio Codecs

Add code
Oct 14, 2024
Viaarxiv icon

DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization

Add code
Oct 14, 2024
Viaarxiv icon

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Add code
Aug 28, 2024
Figure 1 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 2 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 3 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Figure 4 for VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Viaarxiv icon

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Add code
Aug 28, 2024
Figure 1 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 2 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 3 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Figure 4 for Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Viaarxiv icon

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Add code
May 24, 2024
Viaarxiv icon

A Closer Look at the Limitations of Instruction Tuning

Add code
Feb 03, 2024
Viaarxiv icon

Efficient Spoken Language Recognition via Multilabel Classification

Add code
Jun 02, 2023
Viaarxiv icon

Audio Similarity is Unreliable as a Proxy for Audio Quality

Add code
Jun 27, 2022
Figure 1 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 2 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 3 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Figure 4 for Audio Similarity is Unreliable as a Proxy for Audio Quality
Viaarxiv icon

Music Enhancement via Image Translation and Vocoding

Add code
Apr 28, 2022
Figure 1 for Music Enhancement via Image Translation and Vocoding
Figure 2 for Music Enhancement via Image Translation and Vocoding
Figure 3 for Music Enhancement via Image Translation and Vocoding
Figure 4 for Music Enhancement via Image Translation and Vocoding
Viaarxiv icon

HEAR 2021: Holistic Evaluation of Audio Representations

Add code
Mar 26, 2022
Figure 1 for HEAR 2021: Holistic Evaluation of Audio Representations
Figure 2 for HEAR 2021: Holistic Evaluation of Audio Representations
Figure 3 for HEAR 2021: Holistic Evaluation of Audio Representations
Figure 4 for HEAR 2021: Holistic Evaluation of Audio Representations
Viaarxiv icon