Picture for Ziyang Ma

Ziyang Ma

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon

Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint

Add code
Nov 17, 2025
Figure 1 for Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint
Figure 2 for Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint
Figure 3 for Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint
Figure 4 for Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

Add code
Aug 24, 2025
Viaarxiv icon

NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025

Add code
Jun 16, 2025
Viaarxiv icon

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens

Add code
Jun 10, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Figure 1 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 2 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 3 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 4 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Viaarxiv icon