Picture for Hengshuang Zhao

Hengshuang Zhao

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Add code
Jan 07, 2025
Viaarxiv icon

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Add code
Jan 03, 2025
Figure 1 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 2 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 3 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 4 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Viaarxiv icon

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data

Add code
Jan 03, 2025
Viaarxiv icon

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Add code
Dec 24, 2024
Viaarxiv icon

FashionComposer: Compositional Fashion Image Generation

Add code
Dec 19, 2024
Viaarxiv icon

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Add code
Dec 10, 2024
Figure 1 for UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Figure 2 for UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Figure 3 for UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Figure 4 for UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Viaarxiv icon

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

SyncVIS: Synchronized Video Instance Segmentation

Add code
Dec 01, 2024
Figure 1 for SyncVIS: Synchronized Video Instance Segmentation
Figure 2 for SyncVIS: Synchronized Video Instance Segmentation
Figure 3 for SyncVIS: Synchronized Video Instance Segmentation
Figure 4 for SyncVIS: Synchronized Video Instance Segmentation
Viaarxiv icon

Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning

Add code
Nov 12, 2024
Figure 1 for Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning
Figure 2 for Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning
Figure 3 for Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning
Figure 4 for Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning
Viaarxiv icon

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Add code
Nov 03, 2024
Figure 1 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 2 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 3 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 4 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Viaarxiv icon