Picture for Ziyang Ma

Ziyang Ma

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Add code
Feb 25, 2025
Viaarxiv icon

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model

Add code
Jan 13, 2025
Figure 1 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 2 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 3 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 4 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Viaarxiv icon

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

Add code
Jan 03, 2025
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization

Add code
Dec 13, 2024
Figure 1 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 2 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 3 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 4 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Viaarxiv icon

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

Add code
Dec 01, 2024
Figure 1 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 2 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 3 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Figure 4 for A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Viaarxiv icon

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Add code
Nov 26, 2024
Viaarxiv icon