Picture for Hengshuang Zhao

Hengshuang Zhao

Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning

Add code
Nov 12, 2024
Viaarxiv icon

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Add code
Nov 03, 2024
Figure 1 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 2 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 3 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Figure 4 for One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Viaarxiv icon

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation

Add code
Oct 14, 2024
Viaarxiv icon

VIRT: Vision Instructed Transformer for Robotic Manipulation

Add code
Oct 09, 2024
Figure 1 for VIRT: Vision Instructed Transformer for Robotic Manipulation
Figure 2 for VIRT: Vision Instructed Transformer for Robotic Manipulation
Figure 3 for VIRT: Vision Instructed Transformer for Robotic Manipulation
Figure 4 for VIRT: Vision Instructed Transformer for Robotic Manipulation
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

LION: Linear Group RNN for 3D Object Detection in Point Clouds

Add code
Jul 25, 2024
Figure 1 for LION: Linear Group RNN for 3D Object Detection in Point Clouds
Figure 2 for LION: Linear Group RNN for 3D Object Detection in Point Clouds
Figure 3 for LION: Linear Group RNN for 3D Object Detection in Point Clouds
Figure 4 for LION: Linear Group RNN for 3D Object Detection in Point Clouds
Viaarxiv icon

Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

Add code
Jul 21, 2024
Viaarxiv icon

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Add code
Jul 18, 2024
Viaarxiv icon

ViLLa: Video Reasoning Segmentation with Large Language Model

Add code
Jul 18, 2024
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon