Picture for Zhiyuan Zhu

Zhiyuan Zhu

Continual Robot Policy Learning via Variational Neural Dynamics

Add code
Jun 25, 2026
Viaarxiv icon

Audio Editing in the Era of Foundation Models: A Survey

Add code
Jun 22, 2026
Viaarxiv icon

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

Add code
Jun 09, 2026
Viaarxiv icon

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Add code
May 27, 2026
Viaarxiv icon

Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs

Add code
Apr 07, 2026
Viaarxiv icon

Learning Agile Quadrotor Flight in the Real World

Add code
Feb 10, 2026
Viaarxiv icon

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Add code
Jan 20, 2026
Viaarxiv icon

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation

Add code
Jul 09, 2025
Viaarxiv icon

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

Add code
May 20, 2025
Figure 1 for TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Figure 2 for TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Figure 3 for TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Figure 4 for TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Viaarxiv icon

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

Add code
Apr 29, 2025
Viaarxiv icon