Picture for Zilong Huang

Zilong Huang

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

Add code
Aug 27, 2024
Figure 1 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 2 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 3 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 4 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Viaarxiv icon

Lightweight Model Pre-training via Language Guided Knowledge Distillation

Add code
Jun 17, 2024
Viaarxiv icon

Depth Anything V2

Add code
Jun 13, 2024
Figure 1 for Depth Anything V2
Figure 2 for Depth Anything V2
Figure 3 for Depth Anything V2
Figure 4 for Depth Anything V2
Viaarxiv icon

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Add code
May 28, 2024
Viaarxiv icon

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Add code
Jan 19, 2024
Viaarxiv icon

Harnessing Diffusion Models for Visual Perception with Meta Prompts

Add code
Dec 22, 2023
Viaarxiv icon

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Add code
Jul 17, 2023
Figure 1 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 2 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 3 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 4 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Viaarxiv icon

Disentangled Pre-training for Image Matting

Add code
Apr 03, 2023
Figure 1 for Disentangled Pre-training for Image Matting
Figure 2 for Disentangled Pre-training for Image Matting
Figure 3 for Disentangled Pre-training for Image Matting
Figure 4 for Disentangled Pre-training for Image Matting
Viaarxiv icon