Picture for Wenming Tu

Wenming Tu

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Add code
Jun 05, 2026
Viaarxiv icon

MMAE: A Massive Multitask Audio Editing Benchmark

Add code
Jun 05, 2026
Viaarxiv icon

A Unified and Reproducible Experimentation Framework for Speech Understanding

Add code
May 29, 2026
Viaarxiv icon

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

Add code
May 27, 2026
Viaarxiv icon

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Add code
Feb 09, 2026
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon

Causal Graph Guided Steering of LLM Values via Prompts and Sparse Autoencoders

Add code
Dec 31, 2024
Viaarxiv icon

V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Add code
Nov 01, 2024
Figure 1 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM
Figure 2 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM
Figure 3 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM
Figure 4 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM
Viaarxiv icon