Picture for Jianwei Yu

Jianwei Yu

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

Add code
Jan 03, 2025
Viaarxiv icon

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

Add code
Dec 18, 2024
Figure 1 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 2 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 3 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 4 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Viaarxiv icon

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Add code
Sep 24, 2024
Figure 1 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 2 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 3 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 4 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Viaarxiv icon

Comparing Discrete and Continuous Space LLMs for Speech Recognition

Add code
Sep 01, 2024
Figure 1 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 2 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 3 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 4 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Viaarxiv icon

Gull: A Generative Multifunctional Audio Codec

Add code
Apr 07, 2024
Viaarxiv icon

Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Add code
Jan 29, 2024
Figure 1 for Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
Figure 2 for Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
Figure 3 for Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
Figure 4 for Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
Viaarxiv icon

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

Add code
Dec 24, 2023
Figure 1 for Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Figure 2 for Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Figure 3 for Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Figure 4 for Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Viaarxiv icon

SECap: Speech Emotion Captioning with Large Language Model

Add code
Dec 23, 2023
Viaarxiv icon

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

Add code
Oct 15, 2023
Viaarxiv icon

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Add code
Sep 27, 2023
Viaarxiv icon