Picture for Xilin Jiang

Xilin Jiang

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Add code
Sep 16, 2024
Viaarxiv icon

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Add code
Sep 07, 2024
Viaarxiv icon

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

Add code
Aug 13, 2024
Viaarxiv icon

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Add code
Jul 13, 2024
Viaarxiv icon

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Add code
May 20, 2024
Viaarxiv icon

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

Add code
Mar 27, 2024
Viaarxiv icon

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

Add code
Feb 06, 2024
Viaarxiv icon

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

Add code
Sep 27, 2023
Viaarxiv icon

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Add code
Sep 18, 2023
Viaarxiv icon

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Add code
May 29, 2023
Viaarxiv icon