Picture for Hengshuang Zhao

Hengshuang Zhao

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Viaarxiv icon

Empowering Large Language Models with 3D Situation Awareness

Add code
Mar 29, 2025
Viaarxiv icon

Sonata: Self-Supervised Learning of Reliable Point Representations

Add code
Mar 20, 2025
Viaarxiv icon

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

Add code
Mar 11, 2025
Viaarxiv icon

Effective LLM Knowledge Learning via Model Generalization

Add code
Mar 05, 2025
Viaarxiv icon

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Add code
Jan 24, 2025
Figure 1 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 2 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 3 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 4 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Viaarxiv icon

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Add code
Jan 21, 2025
Figure 1 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 2 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 3 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 4 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Viaarxiv icon

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Add code
Jan 07, 2025
Figure 1 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 2 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 3 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 4 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Viaarxiv icon

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data

Add code
Jan 03, 2025
Viaarxiv icon

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Add code
Jan 03, 2025
Figure 1 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 2 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 3 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 4 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Viaarxiv icon