Picture for Jun Zhu

Jun Zhu

Tsinghua University

RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization

Add code
Feb 03, 2026
Viaarxiv icon

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Add code
Feb 02, 2026
Viaarxiv icon

GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining

Add code
Jan 27, 2026
Viaarxiv icon

Video Compression with Hierarchical Temporal Neural Representation

Add code
Jan 25, 2026
Viaarxiv icon

Frequency-aware Neural Representation for Videos

Add code
Jan 25, 2026
Viaarxiv icon

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Add code
Jan 15, 2026
Viaarxiv icon

Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead

Add code
Jan 08, 2026
Viaarxiv icon

Omni2Sound: Towards Unified Video-Text-to-Audio Generation

Add code
Jan 06, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon