Picture for Haiyang Sun

Haiyang Sun

Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

Add code
Oct 08, 2025
Viaarxiv icon

Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics

Add code
Aug 24, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

DiffMark: Diffusion-based Robust Watermark Against Deepfakes

Add code
Jul 02, 2025
Viaarxiv icon

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Add code
Jun 09, 2025
Viaarxiv icon

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Add code
Jun 09, 2025
Viaarxiv icon

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Add code
May 26, 2025
Viaarxiv icon

PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth

Add code
May 03, 2025
Viaarxiv icon

SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction

Add code
Apr 16, 2025
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon