Picture for Jun Zhu

Jun Zhu

Tsinghua University

Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead

Add code
Jan 08, 2026
Viaarxiv icon

Omni2Sound: Towards Unified Video-Text-to-Audio Generation

Add code
Jan 06, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon

Motus: A Unified Latent Action World Model

Add code
Dec 15, 2025
Viaarxiv icon

Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

Add code
Nov 17, 2025
Viaarxiv icon

Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models

Add code
Nov 16, 2025
Figure 1 for Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models
Figure 2 for Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models
Figure 3 for Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models
Figure 4 for Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models
Viaarxiv icon

TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control

Add code
Oct 31, 2025
Viaarxiv icon

Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents

Add code
Oct 09, 2025
Figure 1 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 2 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 3 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 4 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Viaarxiv icon

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Add code
Sep 19, 2025
Viaarxiv icon