Picture for Hengshuang Zhao

Hengshuang Zhao

Sonata: Self-Supervised Learning of Reliable Point Representations

Add code
Mar 20, 2025
Viaarxiv icon

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

Add code
Mar 11, 2025
Viaarxiv icon

Effective LLM Knowledge Learning via Model Generalization

Add code
Mar 05, 2025
Viaarxiv icon

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Add code
Jan 24, 2025
Figure 1 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 2 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 3 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 4 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Viaarxiv icon

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Add code
Jan 21, 2025
Figure 1 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 2 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 3 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Figure 4 for DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Viaarxiv icon

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Add code
Jan 07, 2025
Figure 1 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 2 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 3 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 4 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Viaarxiv icon

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Add code
Jan 03, 2025
Figure 1 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 2 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 3 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 4 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Viaarxiv icon

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data

Add code
Jan 03, 2025
Viaarxiv icon

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Add code
Dec 24, 2024
Viaarxiv icon

FashionComposer: Compositional Fashion Image Generation

Add code
Dec 19, 2024
Viaarxiv icon