Picture for Xiquan Li

Xiquan Li

MMAE: A Massive Multitask Audio Editing Benchmark

Add code
Jun 05, 2026
Viaarxiv icon

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Add code
Jun 02, 2026
Viaarxiv icon

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

SemanticAudio: Audio Generation and Editing in Semantic Space

Add code
Jan 29, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

Towards Reliable Large Audio Language Model

Add code
May 25, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Add code
Feb 25, 2025
Figure 1 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 2 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 3 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 4 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Figure 1 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 2 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 3 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 4 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Viaarxiv icon