Picture for Jianwei Yu

Jianwei Yu

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Add code
Jan 18, 2026
Viaarxiv icon

VibeVoice Technical Report

Add code
Aug 26, 2025
Viaarxiv icon

SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Add code
Jun 09, 2025
Viaarxiv icon

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Add code
Jun 09, 2025
Figure 1 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 2 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 3 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 4 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Figure 1 for WAKE: Watermarking Audio with Key Enrichment
Figure 2 for WAKE: Watermarking Audio with Key Enrichment
Figure 3 for WAKE: Watermarking Audio with Key Enrichment
Figure 4 for WAKE: Watermarking Audio with Key Enrichment
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Figure 1 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 2 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 3 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 4 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

Kimi-Audio Technical Report

Add code
Apr 25, 2025
Figure 1 for Kimi-Audio Technical Report
Figure 2 for Kimi-Audio Technical Report
Figure 3 for Kimi-Audio Technical Report
Figure 4 for Kimi-Audio Technical Report
Viaarxiv icon